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Preface 



The International Conference on Networking (ICNOl) is the first conference in its 
series aimed at stimulating technical exchange in the emerging and important field of 
networking. On behalf of the International Advisory Committee, it is our great 
pleasure to welcome you to the International Conference on Networking. Integration 
of fixed and portable wireless access into IP and ATM networks presents a cost 
effective and efficient way to provide seamless end-to-end connectivity and 
ubiquitous access in a market where demands on Mobile and Cellular Networks have 
grown rapidly and predicted to generate billions of dollars in revenue. The 
deployment of broadband IP - based technologies over Dense Wavelength Division 
Multiplexing (DWDM) and integration of IP with broadband wireless access 
networks (BWANs) are becoming increasingly important. In addition, fixed core 
IP/ ATM networks are constructed with recent move to IP/MPLS over DWDM. More 
over, mobility introduces further challenges in the area that have neither been fully 
understood nor resolved in the preceding network generation. This first Conference 
ICNOl has been very well perceived by the International networking community. A 
total of 300 papers from 39 countries were submitted, from which 168 have been 
accepted. Each paper has been reviewed by several members of the scientific 
Program Committee. 

The program covers a variety of research topics which are of current interest, such as 
mobile and wireless networks, Internet, traffic control, QoS, switching techniques, 
Voice over IP (VoIP), optical networks. Differentiated and Integrated services, IP and 
ATM networks, routing techniques, multicasting and performance evaluation, testing 
and simulation and modeling. Together with four tutorials and four Keynote 
Speeches, these technical presentations will address the latest research results from 
the international industries and academia and reports on findings from mobile, 
satellite and personal communications on 3rd and 4th generation research projects 
and standardization. 

We would like to thank the scientific program committee members and the referees. 
Without their support, the program organization of this conference would not have 
been possible. We are also indebted to many individuals and organizations that made 
this conference possible (Association "Colmar-Liberty”, GdR CNRS ARP, Ministere 
de la Recherche, Universite de Haute Alsace, Ville de Colmar, France Telecom, 
IEEE, lEE, 1ST, WSES). In particular, we thank the members of the Organizing 
Committee for their help in all aspects of the organization of this conference. 
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Preface 



We wish that you will enjoy this International Conference on Networking at Colmar, 
France and that you will find it a useful forum for the exchange of ideas and results 
and recent findings. We also hope that you will be able to spend some times to visit 
Colmar, with its beautiful countryside and its major cultural attractions. 
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Abstract. A bandwidth management scheme is proposed that can support the 
seamless QoS in face of handoff in mobile networks. The proposed scheme is 
based on the time-selective bandwidth reservation with the reduced signaling 
and computational overhead. The reservation parameters are adjusted dynami- 
cally to cope with user mobility. Throughout the computer simulations, the per- 
formance of the proposed scheme is evaluated. The simulation results show that 
the handoff call blocking probability can be remarkably improved with a slight 
degradation of other parameters such as new call blocking probability and 
bandwidth utilization efficiency. 



1 Introduction 

The next generation high-speed mobile networks are expected to support multimedia 
applications. As such, it is important that these networks provide the quality-of- 
service (QoS) guarantees. QoS support for multimedia traffic has been extensively 
studied for wired networks. Supporting QoS in mobile networks is complicated due to 
user mobility and unreliable radio channels. This problem becomes even more chal- 
lenging as recent mobile networks tend to be deployed over small-size cells (i.e., 
micro-cells or pico-cells) to allow higher transmission capacity. 

One of the main objectives of IMT-2000 is to provide mobile users with multime- 
dia services. Internet services also play an important role in IMT-2000 as the Internet 
grows rapidly. For data services to be implemented efficiently, a separate mobile 
packet network based on either GPRS or Mobile IP has been suggested. To support 
QoS in mobile packet networks, somewhat different service strategies need to be 
developed with the mobile’s characteristics sufficiently taken into consideration [1]. 

First of all, the QoS architecture in mobile networks must be built on the top of the 
exiting QoS concepts used in wired networks. Then, the effects of user mobility and 
wireless communication channels should be incorporated into the architecture. Espe- 
cially, our focus is on the QoS features related to the user mobility. We call this ‘mo- 
bile-QoS’. The mobile-QoS involves additional handoff-related parameters such as 
handoff blocking rate etc. To guarantee the mobile-QoS, bandwidth reservation is 
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conceived as one of the most assured schemes. When a mobile is located in a cell, the 
bandwidth of wireless channels in some neighbor cells are reserved in advance for the 
mobile’s handoff. Unless properly managed, however, bandwidth reservation may 
incur a significant waste of network resources. 

The existing bandwidth reservation schemes may be divided into two categories: 
static reservation and dynamic reservation [2]. The dynamic reservation scheme is 
classified into two types again: time-continuous reservation [3] and time-selective 
reservation [4] [5]. The former reserves bandwidth on all neighbor cells since new 
calls were generated until they are terminated. But, in the latter case, bandwidth res- 
ervation is done selectively on the neighbor cells according to the estimated arrival 
time of the mobile at each cell. It is obvious that the latter shows better bandwidth 
utilization against the former, but signaling and computational overhead is increased. 

The purpose of the QoS-based bandwidth management in mobile networks is to 
provide the requested QoS of each call regardless of mobile’s handoff while main- 
taining the maximum utilization of network resources. To this end, this paper pro- 
poses a new bandwidth management scheme that can support the seamless mobile- 
QoS in the face of handoff in mobile networks. The proposed scheme is essentially 
based on the time- selective bandwidth reservation described in [4] and [5]. The dif- 
ference is that the estimation of user mobility is done by the aggregated measurement 
instead of call-by-call computation. By this way, signaling and computational over- 
heads can be significantly reduced. 

This paper is organized as follows. In Section 2, we establish a framework for 
bandwidth reservation and a reservation model to support the mobile-QoS. In Section 
3, we propose a reservation-based bandwidth management scheme. In Section 4, the 
performance of the proposed scheme is evaluated using computer simulations and 
some numerical results are provided. Finally, we conclude our work in Section 5. 



2 Mobile- QoS Framework 

2.1 Reservation Parameters 

With regard to the bandwidth reservation for the mobile-QoS, we encounter three 
fundamental questions: where (the selection of neighbor cells to be reserved), when 
(the decision of starting time and ending time for reservation), and how much (the 
allocation of certain amount of bandwidth for reservation)? To answer these ques- 
tions, we introduce the following parameters: reservation range, reservation interval, 
and reservation bandwidth. 

Reservation Range. When a new call is generated in a particular cell, the reservation range 
represents the set of neighbor cells for which certain amount of bandwidth is reserved. The 
reservation range should be properly chosen depending on the requested QoS and the user 
mobility. If the reservation range is larger than required, bandwidth will be undemtilized. 
Conversely, if it is less, then the requested QoS cannot be properly supported. 
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Reservation Interval. If a cell has been selected to be included in the reservation range, then 
the next thing to do is to decide the reservation interval on the time axis. To do this, both arri- 
val time and departure time of the mobiles should be estimated as precisely as possible. Based 
on these estimations, the starting time and the ending time of reservation interval are deter- 
mined. 

The arrival time is the sum of residence time in each cell traversed by the mobile. 
The residence time is again a function of cell size and user mobility. The departure 
time also depends on the residence time in the cell. In this case, however, the call 
holding time must be also considered. If we want to offer better mobile-QoS, some 
guard times can be appended to both ends of the interval. 

Reservation Bandwidth. Once the reservation interval has been set, the amount of 
reservation bandwidth is also an important factor that has direct impacts on the mo- 
bile-QoS. The reservation bandwidth is basically related to the required mobile-QoS. 
Unlike the fixed network, the reserved bandwidth would be proportional to the re- 
quired bandwidth as well as the required handoff blocking probability. 

Table 1 indicates the network or mobile characteristics that affect the reservation 
parameters. 



Table 1. Reservation parameters 
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2.2 Design Principles 

Recall that the objective of bandwidth management is to maximize the utilization of 
network resources and at the same time meet the QoS requirements. Then there natu- 
rally arises a trade-off between bandwidth utilization and the degree of mobile-QoS 
provided in determining the reservation parameters. From the viewpoint of 
implementation, additional important factor to be considered is the simplicity 
represented by signaling and computation. 

The most straightforward way to support the mobile-QoS is overprovisioning. 
Now, if we estimate the user mobility more accurately, the same level of mobile-QoS 
could be supported with the less bandwidth required. However, this can be achieved 





4 S-H. Lee, D.-S. Jung, and S.-W. Park 



at the expense of the simplicity due to the increased signaling and computational 
overheads. 

Let us take an example to show this trade-off in more detail. Depending on 
whether we allow or not the reservation range to vary when handoff occurs, there are 
two alternatives: static or dynamic. The reservation parameters are fixed while a call 
is in progress (static reservation range), or they may be updated whenever handoff 
occurs (dynamic reservation range). The latter provides better mobile-QoS and band- 
width utilization, but requires more overheads. 

Bearing these facts in mind, we describe our design principles for bandwidth res- 
ervation. The basic idea is to support the mobile-QoS with the emphasis on the sim- 
plicity (minimum signaling and computational overhead). First, the bandwidth reser- 
vation is synchronized on the time slots same as in [4]. This will reduce the computa- 
tional overhead significantly compared to the case with no time slots. The size of time 
slot must be carefully chosen not to degrade the overall network performance. Sec- 
ondly, the estimation of user mobility is performed only by the measurement. Rather 
than relying on computation, user mobility is estimated by the aggregated behavior of 
mobiles that have arrived at each cell. Thirdly, the decision process of reservation 
parameters is performed in a distributed way. This will keep the required signaling 
among cells at minimum level. Finally, bandwidth reservation is done only once per 
call when it is generated. This will also reduce both the signaling and computational 
overheads even though the mobile-QoS can be worse to some extent. 



2.3 Reservation Model 

Now we describe the basic reservation model for the bandwidth management scheme 
to be based on. Figure 1 shows a typical architecture of mobile networks. As shown 
in Figure 1, each cell can be modeled by the hexagonal geometry structure. Centered 
on a particular cell (cell A), neighbor cells belong to one of the rings depending on 
the distance from the center cell. That is, ring-i indicates a set of neighbor cells that 
are /-hop away from the center cell. Accordingly, cell B and cell C belong to ring-1 
and ring-2, respectively. Ring-0 automatically implies the center cell itself. 




Fig. 1. Cell architecture 



Now suppose that two calls requiring bandwidth reservation were generated each 
from cell B and cell C. Assume also that the reservation range of cell A is at least two 
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hops and both cells fall within the reservation range of cell A. Figure 2 shows the 
reservation intervals of these calls that are referred to call B and call C, respectively. 
From the Figure 2, we need to mention some facts regarding the bandwidth reserva- 
tion. Comparing the starting times of two reservation intervals, we can conjecture that 
the estimated speed of the mobile C is slower than the mobile B given that the path 
lengths traversed by two mobiles are same. It is possible for the call C’s starting time 
is placed ahead of call B’s starting time if the mobile C moves much faster than the 
mobile B. If we assume that two mobiles move at the same speed and traverse differ- 
ent paths, similar arguments can be applied to the estimated path length. We also 
observe that the amount of bandwidth reservation for call C is less than that of call B. 
This is due to the fact that cell C is located farther away from cell A than is cell B 
even though two mobiles request the same level of mobile-QoS. It is also possible 
that the amount of bandwidth reservation for call C can surpass that of call B if call C 
requires more stringent mobile-QoS. 



Current time slot: Two 
calls are generated from 
cell B and cell C 




Reservation interval Reservation interval 
for call B for call C 



Fig. 2. Structure of reservation slot 



3 Bandwidth Management 



The proposed bandwidth management scheme consists of two parts: adaptive control 
of reservation parameters and the associated connection admission control. 



3.1 Reservation Control 

The reservation parameters mainly rely upon network architecture and user mobility. 
Given the network architecture, the offered mobile-QoS is dependent on how precise 
user mobility can be estimated. As stated above, if we want more accurate estimation, 
it is inevitable to take additional signaling and computational overheads. This may be 
prohibitive in some situations where fast response time is required. 

Our approach to overcome these limitations is to estimate user mobility by a poste- 
riori measurement instead of a priori computation. That is, user mobility is derived 
from the collection of the arrived mobile’s statistics. Then, the reservation parameters 
are dynamically adjusted based on this measurement-based estimation. By doing this, 
it is possible to control the reservation parameters adaptively reflecting the current 
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status of the network. We describe the decision criteria to adjust the reservation pa- 
rameters in more detail. 

Reservation Range. For a particular cell, the frequent arrivals of the mobile-QoS calls with no 
reservation can be interpreted as the indication that the current reservation range is too small. 
This can happen when the call holding time is larger or the required QoS is more stringent than 
estimated. Therefore, if the number of handoff requests by the mobile-QoS calls with no reser- 
vation at the cell increases beyond a certain level, we need to increase the corresponding reser- 
vation range. On the other hand, if the number of mobile-QoS calls with reservation that do not 
arrive at the cell increases beyond a certain level, we need to decrease the current reservation 
range. To do this, we require that each cell maintain the list of mobiles that has reserved band- 
width. For a practical purpose, this list can be maintained within the limited range of time slots 
(reservation window). 

If we are able to identify the origin cell of each handoff call, it is possible to check each 
neighbor cell separately whether it belongs to the reservation range. We call this non-uniform 
reservation range since only a portion of neighbor cells in the same ring may join the reserva- 
tion range. For the simplicity, the reservation range may contain every neighbor cell in the 
same ring, which is called uniform reservation range. 

Reservation Interval. Note that the reservation interval consists of the starting time and the 
ending time of a mobile with reservation. The starting time must be able to move forward or 
backward depending on whether the mobile-QoS calls arrive earlier or later than reserved,. On 
the other hand, the ending time represents the estimation of cell residence time and is depend- 
ent on the departure time of the mobile-QoS calls leaving the cell. Similar to the starting time, 
the ending time also moves back and forth depending on whether the mobile-QoS calls leave 
the current cell earlier or later than expected. 

Reservation Bandwidth. Even though a mobile-QoS call arrives on time at the reserved cell, 
the handoff request can be blocked when it finds no available bandwidth. It is apparent to 
increase the amount of reservation bandwidth to prevent this type of bandwidth inefficiency. 
That is, if handoff blockings for the reserved calls happen too frequently, the amount of band- 
width per reservation must be increased. On the contrary, if the reserved bandwidth is too 
underutilized, we should reduce the amount of reservation bandwidth and leave more rooms 
for the newly generated calls. 

We describe the proposed management scheme in the form of pseudo code. 

// Reservation Range: 

if (NRl > TRl) 

if ( (NR2 / NRl > TRH) and ( RR < RRmax ) ) 

{ RR++; NR1=0; NR2=0;} 

else if ( (NR2 / NRl < TRL) and ( RR > RRmin ) ) 

{ RR--; NR1=0; NR2=0} 

Where , 

NRl : Number of QoS handoff request 

NR2 : Number of QoS handoff request with no reservation 

TRl : Decision value to perform range adjustment 

TRH : Comparison threshold for increment 

TRL : Comparison threshold for decrement 

RR : Reservation range 

RRmax : Maximum reservation range 

RRmin : Minimum reservation range 



// Reservation Bandwidth: 
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if (NBl > TBl) 

if ( (NB2 / NBl < TBL) and ( BR < BRmax) ) 

{ BR++; NB1=0; NB2=0; } 

else if { {NB2 / NBl < TBH) and {BR < BRmin) ) 

{ BR--; NB1=0; NB2=0; } 

Where , 

NBl : Number of QoS handoff request 

NB2 : Number of accepted QoS handoff request 

TBl : Decision value to perform bandwidth adjustment 

TBH : Comparison threshold for increment 

TBL : Comparison threshold for decrement 

BR : Amount of the bandwidth reservation 

BRmax : Maximum amount of the reservation bandwidth 

BRmin : Minimum amount of the reservation bandwidth 

// Reservation Interval (Starting Time) : 

if (NIS > TISl) 

if ( (NIE / NIS > TIS2) and ( VIS > TImin ) ) 

{ VIS--; NIE=0; NIS=0; } 

else if { (NIL / NIS > TIS2 ) and { VIS > TImax ) ) 

{ VIS++; NIL=0; NIS=0; } 

Where , 

NIS : Number of QoS handoff request 

TISl: Decision value to perform slot interval adjustment 
TIS2 : Comparison threshold for increment or decrement 
NIE : Number of early arrived QoS handoff calls 
NIL : Number of late arrived QoS handoff calls 
VIS : Reservation interval (i.e., starting time) 

TImax : Maximum reservation interval 
TImin : Minimum reservation interval 

// Reservation Interval (Ending Time) : 

if (NSO > TIS2) 

if (VSO / NSO > AR) 

{ AR++; NSO=0; VSO=0; } 

else if (VSO / NSO < AR) { AR--; NSO=0; VSO=0; } 

Where , 

NSO : Number of handoff QoS calls 

VSO : Residence time of the handoff QoS calls 

TIS2 : Decision value to perform res. interval adjustment 

AR : Reservation interval (i.e., ending time) 



3.2 Connection Admission Control (CAC) 

For the bandwidth management to perform properly, control of reservation parame- 
ters is accompanied by CAC. As long as the mobile-QoS is concerned, the CAC is 
applied separately to new calls and handoff calls. From the viewpoint of mobile-QoS, 
each call can be divided into two classes depending on whether it requires the mobile- 
QoS or not: mobile-QoS calls and non-mobile-QoS calls. 

Figure 3 and 4 describes the CAC algorithm for handoff calls and new calls using 
pseudo codes. Here, for the sake of simplicity, we assume that the reservation range is 
uniform. Let BW_req(i), BW_res(i), and BW_avl(i) denote the requested bandwidth, 
the reserved bandwidth, and the available bandwidth of a cell that is located i hops 
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away from the cell where new calls were generated, respectively. In particular, note 
that i=0 indicates the cell where the call was generated (for the case of new call) or 
the call arrives (for the case of handoff call). Recall that the bandwidth reservation is 
done only once per call when it is generated and therefore the bandwidth reservation 
for the neighbor cell is restricted to new calls only. 



if QoS call arrives 

if BW_req(0)<BW_res(0)+BW_avl(0) 
accept the call 

else 

reject the call 

else (non-QoS call arrives) 

if BW_req(0)<BW_avl(0) 
accept the call 

else 

reject the call 



Fig. 3. CAC algorithm (handoff call) 



if QoS call arrives 

if BW_req(i)<BW_avl(i) 0 £i=0 to H 

reserve bandwidth and accept the call 

else 

reject the call 
else (non-QoS call arrives) 

if BW_req(0)<BW_avl(0) 
accept the call 

else 

reject the call 



Fig. 4. CAC algorithm (newcall call) 



4 Simulation Results 

We evaluate the performance of the proposed bandwidth management scheme 
through computer simulations. 

For the simplicity, our assumptions to be used in the simulations are as follows. 

The cell structure is linear (e.g. highway). 

The radius of each cell is uniformly distributed with the average of 1 km. 

The number of available channels in each cell is limited to C=60 
The maximum bandwidth is B=1.6Cb where b is the basic unit of bandwidth 
The amounts of reservation bandwidth for mobile-QoS calls and non-mobile-QoS 
calls are 2b and b, respectively 

The ratio of generation rates for mobile-QoS and non-mobile-QoS calls is 3:7. 

The call holding time is exponentially distributed with the average of 180 sec. 
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The mobile’s speed is uniformly distributed over the range of 0 and 100 [Km/h]. 
Initial values of the reservation parameters are as follows. 

The initial reservation range is 1 hop and upper bounded by 3 hops. 

The initial reservation interval is 70 sec. 

The initial reservation bandwidth for mobile-QoS calls amount to b/8. 

During the simulations, the reservation parameters show the following statistics.. 

The average reservation range converges at 2 hops. 

The average reservation interval approaches to 120 sec. 

The average reservation bandwidth for mobile-QoS calls increases by 10%. 

Throughout the simulations, our main concern is to measure the degree of mobile- 
QoS focused on the blocking probabilities for new calls and handoff calls. In terms of 
those parameters, the proposed scheme is compared to the case without reservation. 
Bandwidth utilization is also compared for two cases. 

Figure 5 through 7 show the simulation results. From Figure 5 and 6, we see that 
the proposed scheme notably reduces the handoff call blocking probability with a 
slight degradation of new call blocking probability. This degradation becomes almost 
indistinguishable as the call arrival rate increases. In the Figure 7, we also observe 
that the proposed scheme shows lower bandwidth utilization. The reason is that the 
rest of available bandwidth is occupied by reservation instead of new calls. 




Fig. 5. New call blocking probability 



5 Conclusion 

In this paper, we proposed a bandwidth management scheme to support the seamless 
mobile-QoS in mobile networks. The proposed scheme is based on time-selective 
bandwidth reservation, and can dynamically adjust the reservation parameters de- 
pending on the measured traffic conditions. Reservation control was designed as 
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simple as possible to reduce signaling and computational overheads. However, efforts 
have also been done not to lose the dynamic features of the proposed scheme. 
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Fig. 6 . Handoff call blocking probability 




Fig. 7. Bandwidth utilization 



From the simulation, we could confirm that the bandwidth reservation provides an 
effective way to support the mobile-QoS since it could lower the handoff blocking 
probability as much as you want. However, the bandwidth reservation may easily lead 
to underutilization and need to be carefully controlled. More works still need to be 
done to find out optimal reservation parameters in a variety of network environments. 
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Abstract. This paper lists various limitations in 2.5G, 3G cellular networks 
regarding services, and contrasts this to a characterization of fourth-generation 
wireless networks (4G). In order to investigate 4G’s feasibility, cost- 
effectiveness, necessary functionality, and its potential with respect to services 
we created a testbed by adding wireless extensions to a public Gigabit IP- 
network, which was further enhanced with VoIP/SIP to deliver interoperability 
with existing services and to enable mobile multimedia applications. We 
present our experiences and analyze the cost-effectiveness of providing 
(wireless) access to existing voice services in this testbed, how and where these 
services and end-users can be hosted, and how interworking with services over 
other networks can be arranged. Furthermore, we present a service architecture 
to negotiate adaptive mobile multimedia communication, with minimal shared 
service knowledge, which enables applications to adapt to and make optimal 
use of the heterogeneous mobile infrastructure. We demonstrate the latter by 
presenting results from building a mobile-aware media-player, and extend it 
further to take into account the user’s context. In conclusion, we show the 
feasibility and cost-effectiveness of building 4G networks and provisioning of 
adaptive mobile multimedia applications by extending the emerging broadband 
infrastructure with existing wireless LANs. 



1 Introduction 

Previous work [1] has shown that multimedia services, including voice, can be 
delivered over wireless links with end-to-end IP-connectivity, in addition to the data 
services that the Internet already provides. This means that the services can be 
agnostic about the network link layer provided that minimal conditions for delivering 
the service are met (e.g., latency, bandwidth, upper-boundary for packet-loss, etc.). As 
a consequence, a third-party application provider can deliver an application to an end- 
user by any network that meets the minimal requirements of the application. 

Concerning wireless networks, we need to move the point of integration of these 
services out o/the cellular access network in order to enable mobile users to directly 
interact with Internet content. GPRS (General Packet Radio Service) provides direct 
Internet access to mobile users and enables the development of multimedia 
applications for the mobile device. These applications can directly integrate content 
that resides anywhere on the Internet. EDGE (Enhanced Data-rate for GSM 
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Evolution), the successor of GPRS, will further increase the bit-rate and thereby 
further relax the limits on the mobile applications and their use of Internet content, 
thus bringing even more multimedia applications to mobile devices. However, while 
upgrading existing GSM-systems with packet-data services is a logical step with a 
plausible business case, we should question any steps beyond that from the 
perspective that alternative technologies and infrastructures are already available and 
being deployed to provide wireless packet-data services. 
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Fig. 1 3G Evolution and beyond 



2 Services Architectures (Towards 3G) 

This section characterizes the service architectures that are used in 2G, 2.5G, 3G, and 
beyond. In addition it characterizes the properties of 4G wireless networks. Fig. 1 
provides an overview. 



2.1 2G 

In 2G, mobile devices authenticate themselves and the identity of the user while 
reporting their location to the Home Location Register (HLR). Speech or data 
sessions are based on circuit switching of radio channels. A very limited packet data 
service is provided by SMS. Except for SMS, all services are mutually exclusive. 
Additional client software in the mobile device (e.g., for Personal Information 
Management) may be used to invoke the services resulting in so-called Smart-Phones. 
WAP-clients in the mobile device offer a simple interface to Internet content that can 
only be accessed through a WAP Gateway, which translates between IP and WAP 
protocols. A web server on Internet can eliminate the need for an HTML filter by 
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publishing pages with Wireless Markup Language (WML) tags. Through the use of 
WML Script content, other services can be invoked (e.g., sending short messages, 
invoking calls). By following a specific URL, the user can download and play a video 
from a media server. Using WAP in the mobile terminal causes user services to be 
strictly dependent on the functionality of the WAP-GW, and thus dependent on the 
network operator. In addition, circuit switched network access disallows 
asynchronous application events, this greatly limits the type of services that can he 
offered to users in a meaningful way. 
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Fig. 2 Service Architectures 



2.2 2.5 G 

GPRS and EDGE will remove some of these limitations, hy offering packet data 
service. Mobile terminals authenticate themselves to the GGSN and report the 
location of the user to the HLR through the SGSN. The Mobile Device obtains an IP- 
address from the GGSN. There are different traffic classes allowing for combinations 
of switched GSM and packet-switched GPRS traffic. The current standard for GPRS 
data traffic incurs considerable latency by interleaving data (in order to increase the 
reliability of data transfer) and to allow for per packet establishment of radio bearers 
(in order to optimize utilization of radio resources). The operator is still in the position 
to encourage, if not require that the mobile device be configured to use servers in the 
operator’s service network to setup multimedia sessions. 





3G and Beyond & Enabled Adaptive Mobile Multimedia Communication 15 



A SIP server can be used to setup multimedia communication between end-points. 
This can be further enhanced by adding a Parlay- API [18] to the SIP server in order to 
execute servlets via a Corba interface on a web server. A web browser can be used for 
customer control of the services — in what can be regarded as a Virtual Home 
Environment (VHE), with integrated interfaces to a Service Control Point (SCP), in 
order to be able to control legacy services. Scripted mobile code can be sent to and 
executed by agents that are co-located with an application client in the mobile device 
[19]. Moving the execution of code to the mobile device has various advantages, e.g. 
performance, and allows the device to report local states back to the server. 



Parlay — The Parlay Architecture is based on Corba interfaces that enable hosting of 
applications outside of specific networks while accessing resources in other networks, 
through gateways that are installed by the network operator, making these 
applications and services available to the user irrespective of what network the user is 
located in. The Parlay API specifications are open and technology-independent, so 
that anyone can develop and offer advanced telecommunication services. 

Clearly we can move services between different networks but only within Parlay 
domains, but this process is entirely controlled by the network operators. Fig. 3 shows 
an example of how a simple service using these interfaces can be built. This example 
was used to prototype wake-up calls and location-dependent information push 
services in a mobile network. 

What is particularly important about this example is that the controlling web 
interface and the application are only synchronized through network-based servers 
across a network boundary. Mobile code can be sent to the device to enhance user 
interaction, but the process must be carried out under the supervision of the 
application servers and require synchronization across network boundaries. 
Furthermore, the Parlay interface must be changed each time to reflect capabilities 
that are present or introduced in SIP [1]. Parlay has these two limiting properties in 
common with other network-centric service architectures, such as WAP, VHE (see 0), 
or TINA-C [20]. 
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2.3 3G Phase 1 



Mobile terminals authenticate themselves and report the location of the terminal to the 
HLR through the combined SGSN and GGSN (which also assigns an IP address to 
the mobile terminal). 3G Phase 1 supports real-time and isochronous multimedia (e.g., 
voice calls) using end-to-end connectivity over wireless links, which is set up using 
servers in the operator’s service network, to negotiate session parameters regarding 
quality of service levels (QoS). A Virtual Home Environment ensures that user access 
to services is independent of the location of the terminal, and that the user interface is 
independent of the terminal, for instance using (as in 2.5G) a web interface (HTTP) 
and Java for customer control. 

In summary, the service architecture does not differ in principle from the one in 
2.5G, and the service architecture offered by an operator of a GPRS, EDGE, or 3G 
Phase 1 network requires any communication, beyond simple browsing of web pages 
to be mediated by servers in the operator’s Service Network. Negotiation of services 
and levels of QoS linked to network specific AAA and mobility mechanisms 
effectively blocks any possibilities to import or export services to/from Internet ad- 
hoc. While this service architecture makes perfect sense from an operator’s point of 
view (and follows an established business model), it disallows or in the best case 
makes it extremely complicated to support the types of communication that we 
propose. Service mobility between this and other networks can in principle be solved 
on a per service basis, with adaptations to deal with the specific requirements for 
mediating functionality in the Service Network. However, we believe this is makes 
services harder to deploy rather than easier! 
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2.4 3G Phase 2 

In 3G Phase 2, Mobile IP is used for handoffs and roaming between 3G and other 
networks; hence user mobility is no longer controlled by the (3G) GSN nodes. 
Mobile terminals authenticate themselves to AAA-servers via the integrated SGSN 
and GGSN node (IGSN), which also acts as a foreign agent for mobile-IP, thus 
assigning an visiting IP address to the mobile terminal. The IGSN reports the location 
change to the home agent of the mobile terminal and forwards the AAA information 
to the HLR for charging purposes. This AAA and mobility scenario enables the 
mobile to negotiate communication with resources outside of the 3G networks 
without intervention of servers in the operator’s Service Network. Naturally, the 
operator can offer support for different levels of QoS and even differentiated charges, 
but the fact remains that the services are negotiated end-to-end, and not simply inside 
the operator’s Service Network. However, mobile terminals are required to have 
detailed knowledge of such support services, which may differ between networks and 
may change over time. Thus, we need a means to describe shared knowledge of 
support services and also means to automatically obtain such knowledge in order for 
services and mobile devices to migrate between networks. This is one of the design 
goals of the extensible Service Protocol [16], see also section 0. 



3 Fourth Generation Wireless (4G) 

Since multimedia can be delivered with end-to-end IP connectivity over wireless links 
[1], this allows us to extend all existing voice services to these networks. So-called 
‘hot spots’ equipped with wireless LAN (WLAN) extensions to the Internet are 
becoming available, and today provide us with even higher bandwidths (e.g. 1 1 Mbps 
in IEEE 802.11b), for example Telia’ s HomeRun [3] system, corporate WLANs, and 
“semipublic” WLANs. 

This is particularly important, since broadband Internet access is being provided in 
a rapidly increasing number of public locations (hot spots) and even homes in urban 
areas. The provisioning of broadband Internet access is being installed / provisioned 
by power companies, transportation companies, housing co-operatives, joint- ventures 
of municipalities, etc., all of whom have a radically different business model than 
traditional telecom vendors and operators of cellular networks. Extending this packet- 
switched infrastructure with wireless access points, such as IEEE 802.11b Wireless 
LAN is straightforward. In addition, mobility solutions, such as Mobile-IP, and IPv6 
are available to provide the necessary scalability that accommodating millions of 
users and devices will require [4,5]. 

Furthermore, solutions for direct access to Internet (not requiring an existing 
subscription, but rather a direct settlement, e.g. with E-cash) are available [6]. In fact, 
such operators simply provide IP-access, they do not necessarily even need to do 
authentication, authorization, and accounting (AAA), since they get paid directly or 
indirectly. 

Consequently, users with mobile devices can, in principle, use any service from 
any third party, without any intervention by the operator that provides the network 
access. It should be noted that attempts to limit the customer’s choice by incumbent 
operators have been found to violate the EU’s competition laws. 
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Thus, the properties of 4G are such that it provides users with (1) multimedia over 
end-to-end IP (wireless) links with (2) high-bandwidth, between (3) multiple, 
heterogeneous, access networks, and with (4) direct access to the Internet and thus 
end-to-end IP-connectivity to (5) third-party mobile multimedia services, without the 
need for prior subscription for Internet access with these access network operators. 



4 Problem Statement 

Two novel aspects characterize the resulting fourth generation wireless network 
scenario (4G). First, the network consists of a conglomeration of heterogeneous 
networks that provide end-to-end IP connectivity over wireless links. In addition, 
aside from mobility support for devices (e.g., Mobile-IP) and support for direct 
anonymous public access, it is a "stupid network" scenario, where the network only 
provides packet transport, and therefore it is an “operator-less” network with respect 
to services. 

The network scenario for 4G networks as outlined in the introduction appears to be 
straightforward, but we need empirical results in order to know how easy and cost- 
effective it really is to provide wireless access to the Internet to end-users. In addition, 
this network has to provide access to existing services of which telephony (voice) is 
the most important one. 

We are also in the position to provide new mobile multimedia applications that 
take advantage of the fact that end-points have computing capabilities, and through 
sensory capabilities can have knowledge about the context of users. Furthermore, the 
“operator-less” service scenario also implies that the mobile users and devices that 
participate in communication over 4G must become smarter, i.e., they must be able to 
respond to a wide range of events: 

1 . Other users, mobile devices, and communication resources may become "visible" 
in an ad-hoc fashion, either by proximity, or actively communicating. 

2. Entities (users, mobile artifacts, and virtual objects) may exchange events that 
range from simple invitations to join a session all the way to manipulations of 
shared virtual objects. 

3. The communication conditions will vary between and even within access 
networks. This is especially the case where wireless communication is concerned. 
Applications must be able to act reasonably given knowledge of the situation. 

An application architecture for such adaptive, mobile personal communication has 
been described in [7,16] featuring mobile agents, VoIP/SIP-enabling multimedia 
applications, using end-to-end IP communication between users over wireless links. 

The questions are thus how to provide easy public access to these services to end- 
users, understand what functionality is needed, and to be able to demonstrate the 
feasibility, cost/effectiveness of the architecture, and its potential regarding 
applications. Our hypothesis is then that such an approach to 4G is both practical and 
feasible, and that it can be enabled to bring such adaptive mobile multimedia 
applications to end-users. These were our purposes for building an experimental 
fourth-generation wireless test bed infrastructure, and verifying our application 
architecture by prototyping, and our results are shown below. 
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5 Experimental Network 



We have built an experimental fourth-generation wireless testbed by extending 
Internet42, an existing Gigabit-Ethernet IP-network [9] (Fig. 5). 

The project 

involved several 

parties: Ericsson 

Radio, Royal Institute 
of Technology (KTH), 

Telia, and Brf Bagen. 

Besides points of 
presence at research 
facilities (Ericsson 
Radio, KTH, and 
Telia) in the 
Stockholm suburbs of 
Kista, Alvsjo, and 
Farsta; Internet42 also 
has a point of presence 
in the center of 
Stockholm where it 
provides, at low cost, 

1 00 Mbps network 
Internet access to each 
apartment in a large 
housing co-operative, 

Brf Bagen [10], We 
have extended the 
services of Internet42 
[9], by adding 1 1 Mbps 
wireless packet data 

access points (IEEE 802.11b), agent servers, media servers and content management, 
voice gateways (VGW) with anonymous direct access to Internet (DIA), support for 
device mobility (Mobile-IP) and service mobility (SIP). We will add GPRS early next 
year to our test bed. The functional components are further explained starting in the 
sections below. 




Fig. 5 Experimental Network Overview 



5.1 Brf Bagen 

In April 1998 the housing cooperative Brf Bagen in central Stockholm installed a 
local area network in all 26 1 apartments and in all companies located in the buildings 
[10]. The main use of the LAN is to provide Internet access through a leased 2Mb/s 
line. 

Gigabit Ethernet is used both to connect the buildings to the Internet42 backbone 
and between the five Ethernet switches, which provide each user with 100 Mb/s 
Ethernet to the Ethernet switch. The housing cooperative acts as an operator with the 
following distinguishing characteristics: (1) Users get real IP-numbers, either 
statically (for servers) or dynamically through DHCP, (2) there is no firewall to the 
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Internet, and (3) there is no restriction on traffic, neither between the users nor to the 
Internet. 

The only local services provided are mail, local personal web pages, and local 
news. Currently 56% of the apartment owners are actively connected to the net. A few 
companies are also connected, and share the bandwidth with the apartment owners. 
The residential LAN infrastructure has worked very well with the exception of a few 
prolonged interruptions on the Internet connection, which led Bagen to change ISP 
after a completely open tender. In important note is that this possible and relatively 
easy due to the fact that Bagen owns the LAN. 



5.2 Wireless Access 

Extending a fixed ethernet 
network with wireless-LAN 
access points (IEEE 
802.11b) near the points of 
presence in research 
facilities is straightforward. 

Adding wireless LAN to a 
housing co-operative and 
thereby providing wireless 
access to Internet over its 
infrastructure in a public 
space is a different matter 
confronting the housing co- 
operative with both 
technical issues (mainly 
security) and non-technical 
questions (concern about 
antenna aesthetics). [3,6,14]. The effort and cost to provide broadband wireless packet 
data was very low. As usage grows we can add access points. With a single access 
point we obtained good coverage in a large public space at low cost (Fig. 6), as the 
cost of hardware was $2600, and the area covered 200m in radius = 126000 m^, thus 
equal to $0.02/ml Users share up to 11 Mbps of bandwidth via a single access point, 
but wireless LAN technology (802.11b) allows us to add access points as the user 
density and demands increase. Monitoring throughput during videoconferences we 
observed 80-90% network utilization. 

5.3 Direct Internet Access 

Wired Equivalent Privacy (WEP) in IEEE 802.11b has a dual purpose of 
authenticating users and providing data encryption with the following disadvantages: 
(1) WEP differs between manufacturers, (2) WEP encryption keys must be manually 
distributed, (3) WEP is set on a per-network basis rather than on a per-user basis, and 
(4) Windows-based machines must be rebooted after a key change. Alternatively, the 
WLAN infrastructure can be complemented with an authentication mechanism based 
on pre-shared or certificate-based keys. However, this approach precludes anonymous 
roaming access. Therefore we used a third method Direct Internet Access (DIA) [6], 




Fig. 6. IEEE 802.11b Coverage at Brf. Bagen in 
Stockholm (Photograph Courtesy of Lantmateriet © 
2000) 
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which provides anonymous authentication and allows the access provider to charge 
via eCash. This approach thus makes access authentication keys redundant, and 
allows simple roaming access. Consequently, there is no reason to do accounting or 
administration of users. Additional security is ensured using IPsec and IKE [14]. 



5.4 SIP 

A SIP redirect server allows end-uses to register with a SIP URL and enables others 
to send them invitations to multimedia communication (enabling personal service 
mobility). Thus, assigning these identities to Personal Agents allowed us to leverage 
its functionality to easily implement remote customer control of personal messaging 
(via web pages) and Internet Telephony (e.g. diverting calls when in a meeting), 
where the voice gateway allows us to locate agents locally or remote as SIP URLs by 
identifying telephone numbers and vice versa. When we allowed the personal agent to 
monitor incoming calls to its number via the telephony-GW using a group number, 
then a consistent and complete (i.e., messaging, Internet-, and switched- telephony) 
solution was achieved for personal communication. SIP invitations can now also be 
sent through firewalls and with NAT [11], which might allow a local network access 
operator (e.g. Brf Bagen) to increase its security while preserving all services. 



5.5 Mobility 

Strategies using Mobile-IP or other network mobility protocols can be used to support 
handoffs [4,5]. We used the Mosquito Net Mobile-IP stack [19] to enable our devices 
to do handoffs between GSM-data and WLAN, in order to investigate the feasibility 
to do VoIP handoffs. These attempts proved to be unsuccessful due to various 
reasons: GSM-data session setup times through a dial-in connection are too time 
consuming, which is fixed by using GPRS. However, we found that infrequent agent 
advertisements (minimally 1 sec. Delays - RFC 2002) overshadow the 500 msec 
latency in the GPRS air interface. Thus a modified approach is needed (e.g., with 
micro-mobility). A separation is needed between mobility for voice and the mobile 
device, so as to circumvent unnecessary delays due to triangular routing, where 
location changes are used to send voice packets to the new address, either via SIP or 
simply using RTP, thus resulting in minimal delays. 



5.6 Quality of Service 

As there was ample bandwidth, the use of speech codecs was unnecessary from a QoS 
perspective. There was no perceivable packet-loss from the perspective of the end- 
user. However, in areas where the signal is weak, we may benefit from using robust 
header compression [12]. 

When end-to-end security needs to be guaranteed, then IPsec is an obvious choice. 
When speech and signaling use different ports then the increased header length can be 
dealt with separately by applying ROCCO [15] to this stream of IP-packets without 
requiring a trust relation to be established between the mobile device and the access 
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point. This allows us to establish the direct Internet access strategy without AAA- 
functionality, lest it be necessary for commercial or operational reasons. 



5.7 Capacity 

In addition, capacity and spectrum efficiency will benefit from using robust header 
compression [12], as IP-headers incur a considerable overhead with respect to the size 
of content for this streaming audio. Furthermore, speech compression will further up 
the number of simultaneous voice users. In ideal cases, with robust header 
compression, and compressing 16 Kbps PCM to around 6 Kbps G.723, the number of 
possible simultaneous voice users, in a single 11 Mbps cell could be well over one 
thousand, corresponding to lO’ users in a macro cell, with the possibility of adding 
more access points if more capacity is needed. 



5.8 Hosting, Interworking 

An important aspect for parties such as Brf Bagen whose focus it is to make 
connectivity available in the infrastructure but are not interested in directly operating 
services, is that should be able to outsource hosting of substantial parts of the 
functionality. This is supported by our architecture, where it is of no concern where 
the components are located as long as they are available on the Internet. All this 
functionality could even be packaged as a do-it-yourself 4G kit, since management of 
the functionality is not necessarily more complex than maintaining a web site. 

Interworking with other Internet access providers, both wireless (e.g., Telia 
Homerun [3]) and fixed networks needs to be addressed at two levels. First, the user 
must be allowed access to and roam between networks. There are technical solutions 
available for both. Second, there must be an agreement between parties who decide to 
allow roaming between others networks. This can be supported by clearinghouse, 
thereby relieving parties of managing mutual agreements. 



6 Enabling Adaptive Mobile Multimedia 

6.1 extensible Service Protocol 

Proprietary protocols between the agents would quickly add up to unmanageable 
complexity in the system. A generic protocol needed for learning and conveying the 
capabilities to communicate with a resource is addressed by the extensible Service 
Protocol (XSP) [16], an XML-based protocol, which allows agents to communicate 
capabilities to other agents, in order for them to use each other's methods. In our 
prototype we use an XSP-enabled agent to act as Mobile Interactive Space, with 
which other agents can register, subscribe to events, query for other entities' presence, 
properties and methods. They can invoke each other's methods, set properties or 
extend each other's capabilities, using XSP. 
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user. The agent is able to monitor events, make intelligent decisions about the user's 
communication context, contact other agents if need be, and invoke communication, 
and leverage the fact that we can use sensors on the mobile device [13], or in the 
environment (e.g., GPS) for even more flexible adaptation of the communication to 
the user’s context [7]. We used this model for prototyping user context-dependent 
information retrieval and voice-communication invocation, presented in the next 
section. 



Fig. 7. Application Architecture and Functionality 



6.2 Smart Delivery of Multimedia 

We have created a Mobile Aware Media Player that takes into account user 
movements in the network and the resultant changes in communication conditions, as 
well as on-going negotiation of content delivery according to the availability of (new) 
multimedia content on Internet media stations and intermediate media stores in the 
access networks [2]. The Personal (mobile) Agent uses the extensible Service 
Protocol for the necessary flexible negotiation between entities (Internet media 
stations, intermediate media stores, and end-users) [16]. 

The Personal (mobile) Agent connects to an Internet Media Station with MP3 
content, which in turn diverts communication to a Content Proxy Agent in the access 
network. The Content Proxy also extends the functionality of the Personal Agent by 
sending a protocol object for streaming and playing out MP3-audio using RTSP when 
the user is on-line. Multimedia delivery is redirected to an optimal point of access 
from a user (price/performance) perspective, based on user context information: e.g.. 
Access Network Agents notify Content Proxies in the access network of available 
bandwidth, and the Location Agents provide location prediction information, on the 
basis of which the Content Proxy the Personal Agent decides to receive content in a 
hot spot with 802. 1 lb WLAN. 
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Fig. 8 Smart Delivery of Multimedia 




7 Conclusions 



Fig. 8 Streaming Content at 64 Kbps (minimum) 



Our contribution describes provides real world experiences from pioneering building 
a large-scale deregulated multimedia enabled mobile Internet, to which existing voice 
services have been moved successfully, and in which a novel open application 
software architecture (i.e., mobile agents and a novel extensible service protocol) 
leveraged the combined mobility and flexibility of end-to-end IP over wireless in 
entirely new classes of applications at the intersection of mobile and ubiquitous 
computing and cellular telephony, and we provide evidence of its feasibility or 
practicality; 

Thus, in addition to the properties of 4G in section 0, our approach to 4G offers 
functionality that can be installed and managed by end-users themselves or such 
organizations whose business concept is likely to be focused on deriving revenues 
other than providing Internet access. This functionality must lend itself to be 
packaged as self-manageable functionality, or be outsourced in case that is a more 
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suitable model for network 
operation. The recipe for 
putting up such a do-it- 
yourself 4G is to simply: 

1. Package functionality 
(e.g., voice gateway, 

SIP server, agent 
server, etc.) 

2. Negotiate fixed 

Internet access (e.g., 
xDSL, cable, fiber, 
etc.) 

3. Decide whether you 
want the functionality 
hosted. 

4. If so, just connect the 
antennas and put them 
on your roof. 

5. Negotiate your service level with the clearinghouse and have it announced. 

We have provided wireless LAN (IEEE 802.11b 11 Mbps) based public access to 
the Internet in various public locations in Stockholm, one of which the housing co- 
operative Brf Bagen. Thus, we have shown that providing functionality to provide 
end-users with broadband end-to-end IP communication over wireless links using 
direct access to Internet is perfectly feasible, cost-effective, and enables adaptive 
mobile multimedia communication. This functionality can either be packaged or 
outsourced to become usable and manageable by end-users of organizations whose 
focus is not primarily to own or operate networks. Thereby, we create a public 
broadband wireless Internet infrastructure that is not regulated in any sense and can, 
in principle be used by anyone. This unregulated infrastructure provides connectivity 
to services that, irrespective of whether they are located locally or elsewhere, are not 
part of the network. Any transactions between end-user and the application service 
provider, is conducted without the network access provider having any knowledge or 
role. On the other hand, an application service provider may benefit from being able 
to get support from the operator for end-users to maximize performance of their 
service in these hotspots. Such a scenario has been outlined in [1], in which case it is 
plausible that the provider of a hot spot will be compensated, and thereby creating 
additional incentive for putting up such networks. Furthermore, we demonstrated a 
mobile application, which given our application architecture, enables users and 
mobile devices to negotiate communication that take into account user context and 
communication conditions. 

In conclusion, by virtue of our results, we claim that development of infrastructure 
for 3G should be aligned according to the criteria that were discussed in this paper. 
Furthermore, R&D efforts regarding applications should focus on enabling mobile 
multimedia communication in such a deregulated infrastructure by adopting a service 
architecture that promotes deregulation on an application level [16]. 
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8 Future Work 

In order to fully understand the challenges when deploying unregulated 4G 
infrastructure along the lines of this paper we will further address providing 
anonymous access to Internet, including AAA. We have already started an 
investigation regarding the role and functionality of a clearinghouse between access 
operators for our scenario. 

Furthermore, we will further investigate our network scenario’s potential with 
respect to applications and enabling mechanisms. We have started to prototype novel 
applications that are based on the application architecture as described in [2], such as 
context-aware 3D-space that can be shared by mobile users who can have 
simultaneous voice communication by means of VoIP. Furthermore, results of 
prototyping the extensible Service Protocol [16] for negotiation of ad-hoc application 
between multiple users and/or devices will be shown during 2001. 
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Abstract. Virtual Home Environment is a new concept that emerged in the 
context of the 3“* generation networks for mobile communications. The 
objective of this paper is to present the work that currently takes place in 
VESPErQ an 1ST project in the area of the VHE. The paper presents the 
VESPER project approach to the VHE architecture, its main components, the 
way they interact and the way this architecture will facilitate the creation of 
services embedded with the VHE concept. The paper presents also two 
demonstration services, their main functional features and how they use the 
VHE architecture defined in VESPER. 



1 Introduction - The VHE Concept 

Virtual Home Environment is a new concept that emerged in the context of the 3"* 
generation networks for mobile communications. The 3'** Generation Partnership 
Project (3GPP) (a standardisation body from the European Telecommunications 
Standards Institute - ETSI) [1] defines VHE as: “a concept for Personal Service 
Environment portability across network boundaries and between terminals. The 
concept of the VHE is such that users are consistently presented with the same 
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personalised features, User Interface customisation and services in whatever network 
and whatever terminal (within the capabilities of the terminal and the network), 
wherever the user may be located” [2]. 

This innovative concept appeared when the terminals and networks capabilities 
rapidly increased, enabling the emergence of a wide variety of highly sophisticated 
and personalised services over the widest possible coverage area. In this context the 
user wants to subscribe services that appear to be the same from the user’s 
perspective, regardless of the access point in use, using even different types of 
terminals, and independently of the physical realisation of the service. 

Important aspects that should be covered in a virtual home environment service 
provision context are: 

• Personalisation of Service Environment 

• Adaptation of Service Environment 

• Service Portability 

• Service Session Mobility. 

Personalisation means the ability of the user to personalise and modify the services 
that he subscribes, choosing the way to access them (network/terminal preferences) 
and to decide on the behaviour/aspect of the user interface. The service provider 
should keep this personalisation, as much as possible, independent of the user 
terminal and access point. 

Service adaptation covers terminal and network adaptation. The service must be 
adaptable to different kinds of terminals, considering their potentially different 
capabilities. Network adaptability refers mainly to quality of service (QoS) 
adjustments (usually downgrades, but also upgrades), which may be necessary if the 
network conditions change or when the service is offered in different terminals. 

A portable service should be accessible from the user’s home network and from 
various alternative networks with the same look and feel. 

Session mobility allows a user that has an active session on a particular terminal to 
move that session to another terminal (e.g. by suspending it on the first terminal and 
resuming it on the second terminal). 

The objective of this paper is to present the work that currently takes place in 
VESPER, an 1ST project in the area of the VHE. The paper presents the VESPER 
project approach to the VHE architecture, its main components, the way they interact 
and the way this architecture will facilitate the creation of services embedded with the 
VHE concept. The paper presents also two demonstration services, their main 
functional features and how they use the VHE architecture defined in VESPER. 



2 VESPER Project 

In the scope of the European 1ST programme, the VESPER project (Virtual Home 
Environment for Service PErsonalization and Roaming users - 1ST- 1999- 10825 (Key 
Action 1.1. 2-4. 2.4)) intends to define and develop an architecture for the Virtual 
Home Environment realisation and validate it with some demonstration services [3]. 
This architecture must provide ubiquitous service availability, personalised user 
interfaces, service portability and session mobility, while users are roaming or 
changing their terminal equipment. The VHE should hide away from the user the 
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variety of access network types (fixed or wireless), the variety of supporting 
terminals, and the variety of the involved network and service providers responsible 
for service provision. 

The project adopts an incremental process for the VHE specification, which will be 
carried out in three phases. The initial two phases will provide output to be tested in 
demonstrations and the third will benefit from the test feedback in order to produce a 
whole and justified result. 



3 VESPER Architectural Approach 

VESPER aims at offering an architectural solution and an implementation of the VHE 
with the capability of providing services to end users with a consistent look and feel, 
independently of location, serving network and terminal technology. It also facilitates 
service adaptation to different network environments supporting directly connected, 
cordless and cellular access. 

VESPER concentrates on VHE aspects not explored before, while at the same time 
it proposes improvements in other aspects, thus achieving the specification of a 
complete VHE architecture. 

Key innovation areas identified in VESPER are: 

• Service continuity; 

• Service scalability (i.e. adaptability to network, terminal characteristics); 

• Service personalisation (i.e. definition, standardisation and management of 
service and user profiles). 

The VHE architecture, defined by the VESPER Consortium, derives from a 
qualitative combination of selected architecture concepts, coming from several 
sources: Telecommunications Information Networking Architecture (TINA) [4], 
PARLAY [5], Open Service Architecture (OSA) [6,7], Telecommunications Service 
Access and Subscription (TSAS), from the recent work within standardisation bodies, 
such as 3GPP and ITU, and from advanced software technology, including distributed 
processing and agent technologies. 

To specify the service architecture enabling the VHE concept, the project identified 
all the VHE requirements [8]. From that the VESPER project defined a VHE service 
architecture that is represented in the following two figures. 

In Eig. 1 the reference VHE architecture may be seen. On the left side the terminal 
with the possibility to have VHE support (in terminals where it is possible to install 
software) is represented and on the right side the VHE Provider (represented by the 
VHE components), the application server and the different networks may be seen. 
The transparency between the VHE Provider and the different networks is achieved 
by the OSA/Parlay gateway. 

In Fig. 2 are depicted the set of VHE components identified by VESPER (the VHE 
components box represented in the network side of Fig. 1). The components can be 
accessed through the User, Administrator and Service APIs and offer the main VHE 
features to these three actors. 
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Terminal Network 




Fig. 1. Reference VHE architecture for VESPER. 




VESPER defined as well a Roaming Model applicable to the VHE, which comprised 
a high level definition of the entities, domains and roles involved and all the 
interactions among them. A Session Model was also defined by the project. Three 
different types of sessions were identified; VHE Session (a temporary association 
between a user and a VHE Provider to allow him to use VHE or Value Added Service 
Provider (VASP) services); VHE Service Session (a temporary association between 
parties (peer parties or client and server) through the mediation of the VHE system. 
The association may subsequently include a VASP Session); VASP Session (a 
temporary association between a number of peer parties or between clients and 
servers interacting according to a VASP service). 
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4 Creation of 3"“ Generation Services 

The emerging 3"* generation networks will enable the existence of very attractive 
services to the end-users. With the capability of transferring audio, video and data at 
high rates, a large variety of services, expected to be far more valuable to the user 
than the current ones, will rapidly appear. 

As these services will be available also over mobile networks, a natural 
requirement from the end-users will be to access personalised services from any 
place, transparently and independently of the underlying network technology and the 
access point. But this functionality will not be feasible if there are not ways of 
facilitating the creation of services, already embedded with those concepts. 

VESPER specifies how the architecture components that specifically provide VHE 
concept features can be associated with the OSA/Parlay frameworks. These 
frameworks are open interfaces between the network and the service provider, 
allowing the service provisioning, independently of the underlying network 
technology. 

Open Service Access (OSA) and Parlay (respectively defined in the contexts of 
3GPP and Parlay Group) provide frameworks and APIs suitable to create services 
based in standardised service features offered by different network technologies. This 
is achieved with an open programming interface (API), which allows applications to 
access the functionalities of the network and some generic support functions in a 
secure way. These functionalities include call control services (generic two-party 
calls, multi-party, conference), user interaction services, messaging services and 
mobility services (user location and user status). 

VHE components will provide as well an open API to VASPs, enabling and 
facilitating the VHE concept within the service. Eeatures such as personalization, 
session mobility or adaptation to different terminals offered by VHE providers will be 
available to services that use this API. 

To validate the VESPER architecture, the consortium has decided to develop some 
applications, namely a Customer Care service and a Calendar service. These 
applications will use the components provided and in order to carry out their 
development the approach was to identify common usage scenarios that fully describe 
the interactions between the users and the VHE Provider (represented here by the 
VHE components) and between the VHE Provider and a service offered by a VASP. 

The sequence diagrams presented next describe examples of these scenarios 
common to all services provided in the context of the VHE. These “standard” ways of 
using various VHE aspects will facilitate the creation and provision of services by the 
service providers. 

In the description of the scenarios such as login to VHE Provider, start a service, 
terminate a service or suspend a VASP session, the VHE specific aspects that bring 
benefits to the services and to the users are emphasised. 

User Login to a VHE Provider and Start a VASP Service 

The user login to the VHE Provider (1) and is authenticated by the system (2). If the 
authentication succeeds, a VHE Session is created (3) and the accounting context of 
this specific user is set (4). The Access component obtains from the Discovery 
component the list of services which the user is subscribed to (5). 
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The VHE Provider enables the user to own several User Profiles (UP) each 
containing different Service Preferences (SP) for the same service and the same 
subscriber. Therefore after the user selects a service to use (6), the Access component 
provides a list of UPs that contains preferences for the service he wants to use (7). 
The user selects one UP (8) from this list. This choice will affect also the look and 
feel of the service provision. 
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Fig. 3. User login to a VHE Provider and start a VASP service. 



Start service is requested to the Session Manager (10). After the Session Manager 
obtains the service address (11), a VHE Service Session is created (12) and a 
connection to the VASP is requested (13). The Accounting Manager starts the 
charging process (14) and a request to the VASP is sent to start the selected service 
(15). After that a VASP session is created and the specific service use cases take place 
in the context of this session. 

Logout VHE Provider / Terminate Service 

The user requests to logout from the VHE Provider (1). The Access component 
checks if there is any active VHE service session (2). If the list is empty, all the 
procedures to delete the VHE session and terminate the connection to the VHE 
Provider are performed by the Access component (9-11). 

If there is any active VHE service session is because the user is currently using a 
VASP service. In this case the user can choose to terminate the service or he can 
request to suspend the VASP session (described in the next session). If the user 
chooses to terminate the service (3), the Session component is informed (4) and 
before the deletion of the VHE Service Session (7), the Session component requests 
the profiles update (5), in the case some changes had occurred, and requests the 
Accounting component to stop charging the user for using this service (6). 
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Fig. 4. Logout VHE Provider / Terminate service 

After this procedures, the connection between the user and the VASP is terminated 
(8) as well as the connection between the user and the VHE Provider (9-11). 

Suspend / Resume a VASP Session 

The user requests the VASP to suspend his participation in the service (1). The VASP 
session should be secured in such a way that the user can resume his context, 
including his service preferences, later in time (2). For that, the Session component is 
informed that the VASP session will be suspended (3) and it requests the Profile 
component to update the service preferences, if some changes had occurred (4). 

The Session component notifies the Accounting component to adapt the charging 
of the user to his suspended state (5). Accounting may not stop but follow another 
policy. The Session component updates the VHE service session to the user 
suspended state (6) and requests to terminate the connection between the user and the 
VASP. 

When the user login again to the VHE Provider and selects to use the suspended 
service (8) he receives a list of suspended sessions so that he can select one to resume 
(10). After the user chooses an UP to define an usage context (11), he informs the 
VHE Provider that he wants to resume a specific VASP session (12). 

The Session component then resumes the VHE service session (14), requests the 
connection to VASP (15), requests the charging resume to Accounting component 
(16) and informs the VASP to resume the VASP session (17). After that the VASP 
session is resumed and the specific service use cases take place in the context of this 
session. 
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5 Two Examples of Applications 

INRIA, in France, and Portugal Telecom Inova 9 ao with the collaboration of INESC 
Porto, in Portugal, are partners of the VESPER consortium and are currently working 
in the development of two services that will be used to validate the project VHE 
architecture [9]. These services are the Calendar service and the Customer Care 
service. The develop these two services without a VHE architecture providing basic 
VHE functionalities would represent a much harder task and the result would be a 
very difficult to port solution. 



5.1 The Calendar Service 

The Calendar service provides a coordinating environment allowing multiple users to 
set up their meetings. In a distributed context, each user has his own calendar that 
should follow him on travel for his convenience of use. When the Calendar service 
receives an invitation to a meeting, it must contact each of the invited attendees, 
specifying the meeting time slot and its location, collect the replies from attendees, 
decide the meeting time and inform the attendees of the final result. 

The server must cope with the fact that different parties involved in the meeting are 
in different networks, use different access mechanisms and that the dialogue with a 
number of the parties may take time because either a party is offline or simply cannot 
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answer immediately. This process can be regarded as the coordination of atomic and 
consistent updates of distributed heterogeneous databases (agendas). 

The service resolves the heterogeneity by adapting dialogues to the attendees’ 
conditions, so that the parties are solicited in their preferred way. 

The use of the Calendar service can be envisaged in (almost) any kind of terminals, 
ranging from a mobile telephone or a PDA to a PC. The service adapts accordingly. 
For example, a party on a mobile phone can be contacted either with a short text 
string or with voice, a party connected with a PDA can be contacted by a low quality 
small image showing the plan to access the meeting point together with text indicating 
the meeting time slot, whereas a party on PC can be contacted with a full colored 
“zoomable” image or via e-mail. 

A user can specify in his service preferences profile where he wants to be reached 
depending on the date or the time (for example, at his work PC from 9am to 6pm, at 
his home fixed phone from 7pm to 10pm) or at certain locations (for example, on his 
freehand mobile phone while he is driving on the highway). 

It may happen that a party cannot answer to the meeting invitation because he is 
not reachable or cannot reply immediately because, for instance, he is already on the 
phone or in another meeting. As a result, the final common decision would be 
delayed. It is assumed that a non-reachable party can, in some occasions, delegate his 
agenda policy to an agent program so that this agent can reply immediately to the 
meeting request. The corresponding party is informed of the meeting when he 
becomes reachable. 

A consistent view of the meeting time is guaranteed to all parties despite 
interruptions of the meeting fixing process and despite failures. 



5.2 The Customer Care Service 

The Customer Care service intends to give support to customers by means of 
interactive product tutorials, solutions proposal through question & answers 
assistance, online-operator assistance and other support services, such as “software 
patch download”. 

A real application example for this service could be a software, hardware or 
electronic equipment company that gives support to the installation, configuration and 
usage of the company products. 

In a usage scenario, after the user login and if successfully authenticated by the 
service, the Customer Care service offers several functionalities: 

• Product Tutorial: the user interactively accesses information about products and 
services from his service/product supplier, organized in a tutorial way. The 
information is basically organised as a multimedia slide session, where the user can 
navigate back, forward or repeat a specific slide. The user can also request the help 
of an online-operator though an audio/video conference. 

• AudioA^ideo Conference: the user can request the establishment of an audio/video 
conference with an online-operator. The Customer Care service should support the 
establishment of an Audio/Video conference between different types of terminals 
and over different types of networks. The intention is to have the possibility of 
establishing an audio conference between two mobile phones, an audio conference 
between an application in a PC and a mobile phone, a video conference using two 
PCs or an audio conference between a fixed phone and a mobile phone. 
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transparently to the user. The audio/video conference connection could he made 
based upon the location of user and online-operators. 

• Questions & Answers: the user has the possibility to follow a questionnaire, 
answer some key questions, so that the service can conclude about the user’s 
problem and propose a solution. 

• Software/Information Download: the user can search and download software 
patches, product manual pdfs, etc, and the online-operators can obtain information 
about previous user’s sessions. 

Besides these basic service specific functionalities, the Customer Care service should 
also enable the user to personalise his service usage, i.e. change the different profiles 
“seen” within the service context and suspend a service session that can be resumed 
later in time, in a different location, with different access network and terminal. 



6 Conclusions 

Although the Virtual Home Environment has been identified as a powerful and 
necessary concept for the future integrated service environment, little work has been 
done so far for its precise definition and validation. The VHE requirements will not be 
met without defining and specifying an architectural framework, enabling the 
realisation of a VHE for service provision and use. The VESPER project appeared 
with this key objective. 

The VESPER approach to reach this objective is based on current trends and 
developments that take place in the context of the 3'“* generation networks. In order to 
take full advantage of network transparency access by service providers, VESPER 
advocate the use of open APIs like the OSA/Parlay frameworks. In the same way, 
VESPER defines an open API, available to service providers, that enables 
personalization, adaptation to terminals or session mobility to services provided in the 
context of a VHE provider. 

This framework provides many benefits to service providers and service 
developers. The use of these open APIs, with all the advantages of the VHE features, 
allows easy and rapid development of new sophisticated telecommunication services. 
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Abstract. Wireless Local Loop fWLL) such as LMDS typically has time- 
varying and high Bit Error Rate (BER) channel characteristics [1-3]. To provide 
Internet service over a such link, error protection mechanisms at link layer are 
necessary [4-6]. However, since different services provided by higher layers 
require different link layer behavior, a multi-service link layer architecture is 
needed, as has been proposed in [7]. The later assigns transmission bandwidth 
in proportion to network layer allocation independent of arbitrary overhead 
rates. As a different approach, we propose a WLL link layer architecture and 
protocol that assign transmission bandwidths according to services’ priorities 
and their actual bandwidth needs (taking in to account network layer allocation 
as well as any overhead occurred). This new model keeps track of the current 
status of each link service, such as link service overhead rates, and informs the 
network layer. We argue that our link layer architecture and protocol more 
effectively support QoS services implemented at higher layers, especially those 
that require hard QoS guarantees (such as guaranteed service in IntServ model 
[9] and Expedited Forwarding (EF) service in DiffServ [10]). 



1. Introduction 

A Wireless Local Loop (WLL) channel typically has time-varying and high Bit Error 
Rate (BER) channel characteristics. Providing Internet service, especially with QoS 
guarantee, therefore requires additional error protection mechanisms at link layer [4- 
6]. In addition, the mixed traffic nature of the Internet also requires that different error 
protection mechanisms be applied to different services. Einally, there should be a 
cooperative mechanism between link layer and the network layer to effectively 
control the use of limited link bandwidth resource by different Internet services, each 
with different link layer overhead. Our proposed link layer architecture and protocol 
seek to answer these requirements. 

The rest of the paper is organized as follows. Eirst we discuss the proposed solutions 
to solve the issue of providing QoS Internet over time varying, error prone wireless 
links, highlighting the unsolved problems. We then propose a new link layer 
architecture and protocol as our solution to solving the QoS problems. We then 
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consider an example of applying our model for a simple Internet traffic including a 
real-time UDP traffic and a non-real time TCP traffic. Finally a short summary will 
conclude the paper. 



2. Effect of a Time-Varying BER Link on Multi-service Internet 

Currently much of the Internet services are huilt on the traditional TCP/IP protocol 
stack. Transport layer protocol TCP was designed with the observation that most 
losses on the Internet are due to congestion, as routers run out of buffers and discard 
incoming traffic. As a consequence, when running TCP over lossy links, TCP 
assumes all the losses are due to network congestion and invokes its congestion 
avoidance mechanism, resulting in a degraded throughput performance. 

Schemes to improve TCP performance over error prone links have been well studied 
and summarised in [4-6]. According to [4], these schemes can be divided into three 
categories: end-to-end protocols (modifications made to TCP itself); link-layer 
protocols that provide reliability at the link layer; and split connection protocols that 
break the end-to-end connection into two parts at the base station. All these studies [4- 
6] show that link layer schemes are easier to implement and provide a better 
performance than the other two options. 

However it is expected that Internet traffic is and will always be of mixed origins: 
non-real-time traffic usually implements TCP as transport layer; while real time 
traffic may use UDP. The real-time traffic can usually accept some level of packet 
loss while it is more critical with respect to timely packet deliveries. Forward Error 
Correction (FEC) is probably a good option for much of the real time traffic. The non- 
real time traffic, on the other hand, is more critical with respect to packet loss rate 
while it can tolerate variable delay. Eor the latter, error protection link layer 
mechanisms such as retransmissions with a local timeout are frequently used. If the 
wireless link layer is to be able to simultaneously handle both types of traffic, it 
should be able to differentiate the traffic introduced from the higher layers and 
provide each with a suitable QoS behavior. In other words, a multi-service link layer 
is needed. 

Reference [7] introduced a multi-service wireless link layer architecture. The model 
tries to isolate the different services and to prevent one from interfering with the 
others. According to the model in [7], the link layer should allocate link layer 
bandwidth in the proportions presented to it by the network layer traffic requirements, 
independent of link layer overhead rates. However, there are two issues associated 
with this approach. 

The first results from the interaction between the rate adaption mechanism 
employed by the TCP transport layer protocol and the link layer allocation startegy of 
[7]. When there is unused bandwidth, the TCP layer will ramp up its throughput, to 
better utilise the available bandwidth. However the increasing transport layer 
throughput will be reflected in an increasing Network layer traffic demand to the link 
layer. In this scenario, with the allocation strategy of [7], the TCP stream will then be 
allocated a progressively increasing share of the link layer bandwidth. With TCP 
overhead rates lower than UDP overhead rates and with link layer allocation 
proportional to network layer traffic demand, the TCP stream will continue to 
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increase its share of link layer bandwidth, at the expense of the UDP stream. The TCP 
stream will build till it ‘hogs’ the link layer bandwidth from the required UDP share, 
and eventually the TCP stream will annexe almost all of the link layer bandwidth. The 
non rate adaptive higher overhead UDP traffic will then have completely lost its link 
layer bandwidth allocation. 

The second issue is that, the link model does not implement any prioritization at the 
link layer and relies on higher layers to do this. As a consequence, when the wireless 
link experiences severe interference, and when the effective link layer throughput 
drops, the performance of all services are equally degraded, before the link status is 
updated and the higher layers reschedule services accordingly. 

To solve these two issues, while considering the three requirement mentioned in the 
Introduction of the paper, we propose a new link layer architecture and protocol that 
take into account differences in service overheads, and allow different services to 
have different priorities. 



3. Proposed Link Layer Protocol 

The architecture for the new link protocol is shown in Figure 1. Packets from the 
network layer introduced to the link layer are first classified and mapped into different 
link layer services based on their protocol fields (TCP or UDP ports), or based on 
Type of Service/DiffServ Code Point field (if DiffServ is provided by the network 
layer). The link layer then does packet fregmenting. Error protection service such as 
Forward Error Correction or Retransmission are implemented, and can be 
implemented differently for different services. Link layer data units (frames) are 
scheduled for transmission using a self-clocked Weighted Fair Queuing Scheduler 
(WFQS) with different priorities of being dropped in case of link difficulty. At the 
receiver side, frames are demultiplexed, reassembled and introduced back to the 
network layer. 

Compared to [7], two new function blocks are introduced in our model: allocation 
mesurement by the network layer and link layer bandwidth consumption mesurement. 
The allocation measurement block measures the traffic assigned by the network layer 
for different service classes. The consumption measurement block measures the actual 
bandwidth required at the link layer for each service type. That bandwidth includes 
the network layer payload as well as the overhead (FEC or Retransmission) added by 
the link layer. 

These two new function blocks aim to keep track of the percentage of the overhead 
added by each link layer service. These overhead rates vary according to link state if 
retransmission or variable rate FEC are used for error protection; the higher BER is, 
the higher the overhead rate must be. The link layer will provide feedback to the 
network layer, informing it of the maximum network layer throughput or “normalized 
capacity”, C., for each service if that service was operative alone. This “normalized 
capacity” is the total link capacity, C, devided by a factor of (1 H- a.), where a. is the 
link layer overhead rate for service i. 
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A Self-Clocked Weighted Fair Queuing Scheduler (WFQS) schedules the next 
frames according to the service weights; these weights are equal to the desired raw 
link capacity for each service, i.e. the product of allocation bandwidth mesured during 
the last time interval, and (1 + a). In general, a higher value of a implies a higher 
priority. When the link capacity decreases (for example due to the change of 
modulation scheme), indicating the chance of frame dropping, the frames with lowest 
priority will be dropped first before any higher priority frames. In other words, this 
model allows services with a higher priority to take the bandwidth from lower priority 
services. Therefore, the higher priority services stand a better chance of not being 
affected by a temporary bad channel condition. Please note that this frame dropping is 
a temporary measure by the link layer, in order to protect frames of higher priority 
from being dropped. For longer term, the problem should be solved by network layer 
packet scheduling adjustment, when it gets the link layer state update at the end of the 
next time interval. This protocol works as follows: 

If the total link bandwidth is C, normalised link layer capacity for the service i is C_, 
where 



1 + a,. 



( 1 ) 
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and the allocation bandwidth introduced by the Network Layer for that coresponding 
service is W_, then in order not to over-assign the link capacity, the allocation 
bandwidths and the normalised link capacities should meet the following requirement: 



S 

i 






<1 



( 2 ) 



If the requirement (2) is not satisfied, the network layer knows that the link layer is 
having difficulty supporting the current bandwidth allocation. The network layer will 
adjust the current bandwidth allocation, for example, by dropping TCP packets, and 
thereby triggering the congestion avoidance mechanism at transport layer, or by 
notifying the application of the real-time traffic to use a lower rate option. 

In general, the network layer has better knowledge of the original traffic 
characteristics and can make a better decision in case of link difficulty, rather than 
just leaving the link layer to drop the performance of all services equally. The service 
with highest priority at the link layer can be mapped to guaranteed traffic (Expedited 
Forward Behavior) provided by DiffServ [9], guaranteed flows in IntServ model [10], 
or realtime UDP traffic in the current best effort Internet model. Service with lowest 
priority can be mapped to Best Effort traffic in the DiffServ model or in the IntServ 
model, or traffic such as FTP, E-mail, in current Internet. Different priorities can also 
be used to differentiate different users as well. 

In a dynamic bandwidth assignment environment, different users share the total 
bandwidth dynamically based on each user’s current requirement. For such an 
environment, an example of a suitable QoS medium access control (MAC) for WLL 
is introduced in [8]. Our model, with different service queues and different priority 
levels, creates the platform for supporting such an implementation. Then the 
bandwidth is fairly shared, without the danger of the bandwidth hogging detriment of 
the model of [7] . 



4. A Mixed Service Example 

In this part, let’s consider an example of mixed real-time and non-real-time traffic; 
real-time traffic uses TCP as its transport protocol and non-real-time traffic uses UDP. 
The following assumptions are made: 

• Channel capacity: 1 Mb/s 

• Non-real time offered traffic: 300Kb/s, 

• Real-time offered traffic: 200Kb/s 

• Non-real-time traffic uses retransmissions as a error protection mechanism, with 
a maximum number of retransmissions of 5. Average packet size for non-real- 
time traffic is 1000 bit/packet. 

• Real-time traffic uses EEC, with 100% overhead. 

• BPSK modulation scheme with coherent detection 

First we estimate the overhead rate for the TCP traffic due to error induced 
retransmission. According to [11], the relationship between the BER of the channel, 
Pj, and Eb/No is: 
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Therefore non-real-time frame error rate P, is 



(3) 

(4) 



The probability of a frame being retransmitted for k time ( 0 < k < 4 ) will be: 

pXk)=\-Pfy{Pf) ^ 5 : 



and probability of a frame being retransmitted 5 times (maximum) will be: 

p,{5)={pyf ( 6 ) 

From equations (3-6), a relation between the percentage of overhead for the non- 
real-time traffic using retransmission can be established using the following equation: 

5 

a = ^P,.(k)xk (7) 

k=0 

The overhead for the real-time traffic is 100% and is unchanged despite the channel 
characteristic variability, because it uses FEC for error protection. In practice, these 
service overhead rates are achieved by using the network layer traffic allocation 
measurement and link layer bandwidth consumption measurement, and are provided 
to the network layer regularly. 

The two graphs below illustrate how network layer throughput and link layer 
bandwidth consumed by each of the TCP and UDP based services varies with channel 
characteristics (BER). The graphs show that, when the channel has an Eb/No greater 
or equal than 6dB, the 300 Kb/s of TCP traffic consumes just over 300 Kb/s of link 
layer bandwidth, while the 200 Kb/s of UDP traffic with its 100% FEC overhead 
consumes 400 Kb/s of link layer bandwidth. Eventually, when the Eb/No drops below 
5 dB, note that it is the TCP traffic which experience a drop in network layer 
throughput. Our protocol degrades throughput of the lower priority TCP traffic but 
maintains throughput of the higher priority UDP traffic. 

Note however that, unlike [7], because our protocol takes in to account link layer 
overhead in allocating link layer bandwidth, our protocol does not suffer from 
bandwidth hogging due to higher layer rate adaptation mechanisms. 



5. Conclusion 

In this paper we have proposed a new QoS supported multi-service link layer 
architecture and protocol, which is superior to that proposed in [7]. Our link layer 
model tries to preserve bandwidth allocation from the higher layer, while considering 
the overhead introduced by the link layer itself. Even in the case of severe 
interference, the link layer can still support hard QoS requirement for the more 
important services while degrading the performance of lower priority services. This is 
critical to support new QoS Internet services such as IntServ or DiffServ over highly 
unpredictable error prone links. We have also provided an example of carrying mixed 
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real-time and non-real-time traffic over the proposed link layer architecture and 
analysed its behaviour. In ongoing research, we are implementing the proposed multi- 
service link model, for a range of traffic streams typical of a wide area ISP network. 



Link Laver Bandwidth Consumed 
versus Eb/No 




Network Layer Throughput versus Eb/No 
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Abstract. We study the problem of QoS guarantee for differentiated 
services. A two-level hierarchical scheduling framework is deployed for 
the separation of the QoS metrics. Due to the desirable property of 
minimizing the maximum packet lateness, the Earliest Deadline First 
(EDF) scheduling is adopted to provide the in-class scheduling for the 
time-sensitive traffic. We propose to employ an EDF scheduler combined 
with an active buffer management scheme (CHOKe) to improve the fair- 
ness of resource allocation and to maintain a good delay performance 
for all real-time applications. Simulation results show that the proposed 
scheme can achieve a better delay performance and make a more fair 
bandwidth allocation between the real-time TCP and UDP connections 
than the First Come First Serve (FCFS) scheduling with the Drop-Tail 
buffer management which is commonly deployed in the traditional IP 
router. 

Keywords: Scheduling, Earliest Deadline Eirst, Active Buffer Manage- 
ment, Real-time Traffic 



1 Introduction 

With the development of digital technology, the network convergence is occurring 
at both the media and the technological levels. The recent achievements in the 
fiber technology of Wavelength-Division Multiplexing (WDM), promise a large 
amount of bandwidth in the future high-speed networks, so that it is possible for 
the multimedia applications to run together with the traditional data-oriented 
services within a single common network. The Internet turns out to become 
the dominant networking technology, yet the multimedia communication with 
Quality-of-Service (QoS) guarantee drives the need of the substantial changes in 
the current Internet infrastructure. 

The multimedia services, such as voice, video and other applications, de- 
mand not only high bandwidth, but also a stringent real-time delay constraint 
due to the fact that the value of the communication depends upon the time at 
which messages are successfully delivered to the recipient. The above services are 
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commonly classified as either soft or hard real-time applications. Soft-real-time 
applications can tolerate some amount of lost messages, while hard-real-time 
applications have zero loss tolerance 0. 

Internet telephony is one of the promising soft-real-time applications, which 
can permit a few lost phonemes (critical units of speech) in a continuous speech 
jl nj | . On the other hand, in the web-based visualization application, large data 
sets are processed in the central server and the results need to be transfered 
back to the client sides for visualization El Due to the interactive nature of the 
whole process, both stringent delay and loss-free data transmission are required. 
Further, the dominant application running on the current Internet is World 
Wide Web (WWW). WWW can support a rich palette of media types, such as 
pictures, audio and video, instead of the past text-only Internet. Web surfing is 
a highly interactive activity; there are not only lossless data communication but 
also lossable multimedia communication involved. 

Because of the tight latency bound in the above multimedia applications, it 
is suggested that they had better not use a reliable transport like TCP because 
retransmission of the lost packet may probably cause packets to arrive too late 
to be useful 0. Instead, the unreliable transport, such as UDP, is recommended. 
However, the hard-real-time application requires both stringent delay and zero 
packet loss, so if it is built up based on an unreliable transport, the application 
must deal with the missing packets, therefore the complexity increases in the 
application software. The recent adopted Real-Time Transport Protocol (RTP) 
follows this approach, which provides mechanisms for dealing with impairment 
such as jitter and loss, as well as for timing recovery and intermedia synchro- 
nization. In the recent proposed framework of Differentiated Service (DiffServ), 
besides the traditional best-effort traffic, two other traffic classes with different 
Per-Hop-Behavior (PHB), Expedited Forwarding (EF) and Assured Forwarding 
(AF), have been suggested to serve real-time applications. The transport in the 
framework is not clearly defined yet, so it is interesting to see whether the hard- 
real-time application can run over TCP. In this paper, the soft-real-time and 
hard-real-time applications are proposed to run on UDP datagrams and TCP 
flows respectively, and based on the proper scheduling and buffer management 
schemes, they can be better served in terms of delay and packet loss. 

In order to support communication service with the QoS guarantee, the net- 
work resources need to be managed in a systematic manner. The FCFS-DropTail 
scheme, which is commonly implemented in traditional IP routers, has some 
problems in the QoS provisions. The FCFS policy can achieve a tight delay 
bound by limiting the buffer size, then packets will be dropped with a larger 
probability when they arrive at routers and there are not enough buffer to store 
them. The situation may be even worse if the traffic is highly bursty. On the 
other hand, the Drop-Tail buffer management can result in global synchroniza- 
tion among multiple TCP connections, which can underutilize the congested 
link because several connections may halve their congestion window at the same 
time. Moreover, the FCFS-DropTail scheme cannot implement the fair resource 
allocation among TCP and UDP flows ca 



The EDF Scheduling with Active Buffer Management 



47 



In the past several years, new scheduling schemes are proposed. They are basi- 
cally the variants of two fundamental disciplines, Generalized Processor Sharing 
(GPS) and Earliest Deadline First (EDF). The packetized version of the GPS 
scheduler (PGPS) can guarantee the minimum per-connection throughput and 
delay bound with flow protection, but it needs to maintain per-connection states 
and reserve large bandwidth for a small delay bound. With the desirable prop- 
erty for the EDF scheduler to minimize the maximum lateness of packets 0, the 
EDF takes advantage over the PGPS to schedule the real-time traffic in terms of 
system scalability and utilization. On the other hand, TGP performance can be 
improved by the active queue management [7| , such as Random Early Detection 
(RED), in which the average queueing delay can be controlled while the transient 
queue-size fluctuation is allowed. However, like Drop Tail, RED is also unable to 
penalize unresponsive flows [E|. Recently, embedded in RED a stateless active 
queue management scheme, with the name of GHOKe, is proposed to work with 
the FGFS scheduling for approximating fair bandwidth allocation, which tries 
to bridge fairness and simplicity in the scheduler H2|. 

A lot of studies have been done upon the scheduling schemes and the buffer 
management schemes respectively. In fact, the queue management strategy can 
be used along with any scheduling scheme, thus it is suggested to study them in 
an integrated fashion in order to get the optimal performance jS]. In this paper, 
the EDF scheme is proposed to schedule the real time traffic in order to get 
the optimal delay performance whereas the active buffer management GHOKe 
works together with the above EDF scheduling to improve the fairness of resource 
allocation between UDP-based and TGP-based real-time communications. 

The paper is structured as follows. In section 2, we introduce the 2-level 
hierarchical scheduling framework in order to support differentiated traffic in 
the Internet, and describe how the EDF scheduler cooperates with the active 
buffer management scheme (GHOKe). Then the simulation based performance 
evaluation is carried out in section 3. Finally we conclude the paper and present 
the future work in section 4. 



2 Hierarchical Scheduling Framework 

The hierarchical scheduling aims to meet the goals of sharing link capacity and 
providing differentiated service, such as real-time service, best-effort service, and 
others The link-sharing is first proposed in P], in which network resources are 
shared among traffic streams and they are grouped according to administrative 
affiliation, protocol, traffic type, or other criteria. The concept of link-sharing 
is implemented as a particular resource management scheduling scheme called 
Glass Based Queueing (GBQ). In GBQ, the user traffic is organized into a tree, 
or hierarchy, of classes, and traffic classes are differentiated by the network. The 
FGFS-DropTail scheme is still suggested to serve the in-class packets due to its 
simplicity. Obviously, the bandwidth allocation is the major concern in GBQ, 
however, multimedia applications also require a tight delay bound. Thereafter, 
the fluid Hierarchical Generalized Processor Sharing (H-GPS) system is proposed 
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as a general and flexible framework to support hierarchical link sharing and 
traffic management for different classes jOl- The H-GPS scheme provides a more 
fine-grained link-sharing structure, which can provide a guaranteed end-to-end 
delay bound for a session if the traffic in that session is leaky-bucket constrained, 
however, this delay bound is very conservative. On the other hand, without a 
large bandwidth reservation in EDF, applications are still possible to obtain 
a small delay bound by assigning packets with more urgent time-tags as their 
deadlines. So in this paper the EDF scheduler or its variants are proposed to 
replace the corresponding GPS scheduler for time-sensitive traffic in the H-GPS 
framework. 

Based on the above discussions, the following hierarchical scheduling frame- 
work is introduced in order to fairly allocate bandwidth among classes in the 
higher level by the GPS scheduler and to maintain the particular throughput 
or delay QoS guarantee by the proper selection of the GPS or EDF scheduler 
in the lower level (see Figure Q. Due to the scalability concern, the schedul- 
ing framework tries to provide service with a class-based QoS. There are two 
types of schedulers in the framework, the general scheduler and the link-sharing 
scheduler. The link-sharing scheduler allocates bandwidth among classes and the 
general scheduler tries to serve a traffic class with its allocated bandwidth share. 
They determine exclusively the packet scheduling in the absence of congestion. 
In the presence of congestion, the link-sharing scheduler controls the scheduling 
of the packets from different classes. The GPS link-sharing scheduler can ensure 
that each interior or leaf class in the multilevel structure will receive its allo- 
cated bandwidth over appropriate time intervals, and distribute any “excess” 
bandwidth fairly among classes. 




The traditional best-effort traffic will still run on the future Internet and 
there may be more services emerging. In this paper, we are only considering 
how to provide service for the best-effort and real-time traffic in the two-level 
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hierarchical scheduling framework, which can be depicted in Figure |2l If one 
more traffic class appears, it should be easy to add one more branch in the 
hierarchical scheduler tree. 




Fig. 2. Hierarchical Scheduling Structure 



To simplify the problem of how the EDF scheduler cooperates with the 
CHOKe scheme, we focus our attention only on the EDF-CHOKe branch in 
the tree. The real-time traffic class can be guaranteed with the worst case of 
minimum bandwidth by the upper PGPS scheduler. If the best-effort traffic has 
not used up its bandwidth, the PGPS scheduler can allocate this excess band- 
width to the real-time traffic, therefore the QoS provision for the real time traffic 
will be better. 

In the later section, the comparison studies are conducted for the various 
combinations between the service policies (FGFS and EDF) and buffer manage- 
ment schemes (DropTail, RED and GHOKe) in the simulation experiments. 

3 Performance Evaluation 

3.1 Simulation Configuration 

The simulations are carried out in the Network Simulator m- Gonsider the 
following simulation scenario with a single congested link, as shown in Figure El 
to study how much bandwidth a single nonadaptive UDP source can obtain when 
routers use different schemes. The congested link in the network is between the 
routers R1 and R2. The link, with the capacity of 1 Mbps, is shared by 1 UDP 
and 32 TGP flows. Each source and destination node is connected to the router 
using a 10 Mbps link, which is ten times the bottleneck link bandwidth. All 
the links have a small propagation delay of 1 ms so that the delay experienced 
by the packet is mainly caused by the queueing delay in the buffer rather than 
the transmission delay or propagation delay. Different scheduling and buffer 
management schemes are deployed in the congested link for the comparison 
studies. The maximum window size of TGP is set to 300 such that it does not 
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become the limiting factor of the TCP flows’ throughput. The TCP flows are 
derived from FTP sessions which transmit a very large size file and the UDP 
source sends packets at a constant bit rate of 2 Mbps, so the link between the 
routers R1 and R2 becomes the bottleneck link in the network. All the packets 
are assumed to have the same fixed size of 1000 bytes. 




Fig. 3. Network Topology 



The FCFS-DropTail, FCFS-RED, FCFS-CHOKe, EDF-DropTail, EDF-RED 
and EDF-CHOKe schemes are studied for comparison. The minimum threshold 
mirith in both RED and CHOKe is set to 100 packets, and the maximum thresh- 
old maxth is set to be twice the mirith- The physical queue size in the above 
schedulers is set to 300 packets. The delay requirements for both UDP and TCP 
flows are set to 600 ms. 

3.2 Simulation Results 




Fig. 4. UDP Throughput Comparison 



The throughputs of the UDP flow under different schemes: FCFS-DropTail, 
FCFS-RED, FCFS-CHOKe and EDF-CHOKe are plotted in Figure 0 From 
Figure0 it is clearly shown that the FCFS-DropTail and FCFS-RED schemes do 
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not discriminate against the unresponsive UDP flow. The UDP flow takes away 
more than 95% of the bottleneck link capacity and all the TCP connections can 
only share the remaining 5% bandwidth. The FCFS-CHOKe scheme provides 
a fairly good resource allocation, in which the total TCP goodput takes up 
around 750 Kbps, while the EDF-CHOKe scheme has a little less TCP goodput 
than CHOKe, which is about 700 Kbps. The individual throughputs of the 33 
connections in the above FCFS-CHOKe and EDF-CHOKe schemes along with 
their ideal fair shares are plotted in Figure 0 




Ideal 

■ FCFS-CHOKe 

■ EDF-CHOKe 



Fig. 5. Per Flow Throuput Comparison 



To provide a quantitative comparison, we adopt the concept of the fairness 
index 0. The fairness index always results in a number between 0 and 1, with 
1 representing the greatest fairness. Based on the results in Table ^ we can see 
that both FCFS and EDF working together with DropTail and RED cannot 
provide a fair bandwidth allocation. Though the EDF-CHOKe scheme is not 
more fair than the FCFS-CHOKe scheme, it is shown that with the active buffer 
management like CHOKe, the EDF scheduling can have much better fairness 
than the traditional EDF with the Drop-Tail buffer scheme and the EDF with 
the active buffer management of RED. 

the packet delay distribution at the the congested link between R1 and R2 
of the TCP connections in the FCFS-CHOKe and EDF-CHOKe schemes are 
plotted in Figure El Because the FCFS-CHOKe scheme makes the scheduling 
decision without considering the delay constraint for the communication, most 
of the TCP packets in EDF-CHOKe are transmitted within their delay con- 
straint (0.6 sec) while most of the TCP packets in FCFS-CHOKe suffer deadline 
violation. 
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Table 1. Fairness Index Comparison 





Fairness Index 


Ideal 


1.0 


FCFS-DropTaif 


0.0305 


FCFS-RED 


0.0309 


FCES-CHOKe 


0.3744 


EDF-DropTail 


0.0304 


EDF-RED 


0.0304 


EDF-CHOKe 


0.2838 




(a) FCFS-CHOKe (b) EDF-CHOKe 

Fig. 6. Packet Delay Distribution of the TCP Connections 



Further more, the statistics about the packet delay in the congested link 
between R1 and R2 for the FCFS-CHOKe and EDF-CHOKe schemes are tab- 
ulated in Tabled As seen clearly in the table, due to the aggressive nature to 
capture the bandwidth over TCP flows, UDP traffic has a pretty good delay 
performance, while TCP traffic obtains a very different treatment in terms of 
delay in the above two schemes. In Table |3 avg refers to the mean of packet 
delay and std refers to its standard deviation. 

We have investigated the effectiveness of the above scheduling schemes in 
terms of throughput and delay separately, however, these two principal network 
metrics are closely related. To describe this relationship, we adopt the power 
of the network |2j, which is the ratio between the throughput and the delay. 
The powers for the TCP and UDP connections in the FCFS-CHOKe and EDF- 
CHOKe schemes are shown in Table 01 Note that the powers of the TCP con- 
nections are calculated with the method of the mean of ratio introduced in |2j. 
Generally, it is expected to get as much throughput and as little delay as pos- 
sible, so with a higher power index, the scheme is more effective. Table 01 shows 
that the powers for both TCP connections and UDP connection in EDF-CHOKe 
are larger than those in FCFS-CHOKe. 
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Table 2. The Delay Statistics (sec) 





Total TCP Traffic 


UDP Traffic 


avg 


std 


avg 


std 


FCFS-CHOKe 


0.9378 


0.1203 


0.0859 


0.2743 


EDF-CHOKe 


0.5665 


0.0398 


0.0476 


0.1564 



Table 3. Power Comparison 





TCP connections 


UDP connection 


FCFS-CHOKe 


2.45 X lO'^ 


3.44 X 10® 


EDF-CHOKe 


3.78 X lO'^ 


7.65 X 10® 



Based on Figure El Figure El Tabled and Table El we can see that the EDF- 
CHOKe scheme can maintain a good delay performance as well as make a more 
fair bandwidth allocation between real-time TCP and UDP connections. 

Although the EDF-CHOKe scheme has a better performance than the tra- 
ditional FCFS-DropTail scheme, the former is more complex than the latter in 
terms of implementation and control. The complexity arises because the EDF- 
CHOKe scheme has to select the packet with the smallest deadline for transmis- 
sion on the link. The scheduler needs to maintain a priority list of deadlines and 
the insertion or deletion from this list has a complexity of 0{logK) operations, 
where K is the number of packets awaiting transmission. Besides, the CHOKe 
buffer management will also add more operation complexity due to the necessity 
of calculating the average queue length for dropping decisions whenever a new 
packet arrives. 



4 Conclusion 

With more and more real-time multimedia applications running on the current 
Internet, the next generation Internet is expected to support a wide range of ap- 
plications with heterogeneous QoS requirements. In this paper, the hierarchical 
scheduling framework is proposed in order to maintain a particular through- 
put or delay QoS guarantees for multimedia applications by building up the 
two-level hierarchical scheduling structure with the inter-operation between the 
GPS scheduler in higher level and the EDF scheduler in lower level. 

In the above framework, the Earliest Deadline First (EDF) scheduler is pro- 
posed to schedule the real-time traffic. Simulation results show that the proposed 
EDF scheduler working with the active buffer management scheme can achieve 
a better delay performance and at the same time make a more fair bandwidth 
allocation between real-time TCP and UDP connections than the First Come 
First Serve (FCFS) scheduler with the Drop-Tail buffer management. 
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There are still many interesting issues for future study. First, how to develop 
the theoretical model for the EDF scheduling so that it can cooperate with 
the GPS scheduler in the higher hierarchy, which may result in a better traffic 
management scheme. For example, if there are too many overdue packets in a 
real-time traffic class, it may be necessary to increase this session weight in the 
GPS scheduler so as to increase its received bandwidth. Second, in the simulation 
study, the UDP traffic model is assumed to be the constant bit stream, but recent 
studies show that the Internet traffic is self-similar [^. Besides, the proposed 
scheduling schemes are only studied in a simple network scenario of one congested 
link. We also intend to explore these mechanisms in more complicated scenarios 
with multiple congested links. 
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Abstract. As the Internet evolves into a global commercial infrastruc- 
ture, there is a growing need to support more enhanced services than 
the traditional best-effort service. This paper describes an architecture 
of provisioning guaranteed Internet services. Unlike conventional QoS 
mechanisms such as Intserv and Diffserv, this architecture enables In- 
ternet resource management through market forces, e.g., pricing. A de- 
tailed analysis of two scenarios of the underlying IP network has been 
presented: ECN capable and Diffserv. We concentrate on the pricing 
model, its implications for traffic management (e.g., admission control) 
and possible example services. A simulation framework is also discussed. 



1 Introduction 

The current Internet is based on the so-called best-effort model where all packets 
are treated equally and the network tries its best to achieve reliable data deliv- 
ery. Although this simple model is very easy to implement, it has a number of 
undesirable consequences when the Internet is evolving towards a multi-service 
network with heterogeneous traffic and diverse quality of service (QoS) require- 
ments. The flat rate charging structure attached to the best-effort model has 
undoubtedly contributed to the growing problem of congestion on the Internet. 
Since the network resources are completely shared by all users, the Internet 
tends to suffer from the well-known economic problem of “tragedy of the com- 
mons” . The greedy users will try to grab as much resources as possible, leading 
to an unstable system and eventually congestion collapses. The Internet has 
been successful till now because most end systems use TCP congestion control 
mechanisms and back off during congestion. However, as the number of TCP- 
unfriendly users increases, such dependence on the end systems’ cooperation is 
becoming unrealistic. The lack of explicit bandwidth policing and delay guaran- 
tees in the current Internet also prevents Internet service providers (ISP) from 
creating flexible packages to meet the different needs of their customers (Q. 

As the Internet evolves into a global commercial infrastructure, there is a 
growing need to support more enhanced services than the traditional best-effort 
service. To address this issue, there have been intensive efforts in the IETF (In- 
ternet Engineering Task Force) to develop a new class of service models called 
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Differentiated Services or Diffserv models 0. The key difference between previ- 
ously proposed Integrated Services (Intserv) models P| and Diffserv is that while 
Intserv provides end-to-end quality of service on a per flow basis, Diffserv is in- 
tended to provide long-term service differentiation among the traffic aggregates 
to different users. In particular, Diffserv pushes the complexity to the network 
edge, and requires very simple priority scheduling/dropping mechanisms inside 
the core. While the number of Internet users keeps growing, the Diffserv solution 
is more suitable because it scales well with increasing number of network users 
and it does not alter the current Internet paradigm much. 

This paper describes an architecture of provisioning guaranteed Internet ser- 
vices. Unlike existing QoS mechanisms such as Intserv and Diffserv, this architec- 
ture enables Internet resource management through market forces, e.g., pricing. 
It is part of a next generation network system currently being developed in the 
EU funded MSI Project (Market Managed Multi-service Internet^ A detailed 
analysis of two scenarios of the underlying IP network has been presented: ECN 
capable and Diffserv. We concentrate on the pricing model, its implications for 
traffic management (e.g., admission control) and possible example services. A 
simulation framework is also discussed. 

2 Guaranteed Service Provider 

Here we consider a scenario where a type of guaranteed service is provided to end 
users that incorporates and extends the classical telephony-like service. Typical 
applications are those with stringent real-time requirements, such as real-time 
audio and video services. Their utility functions look like step functions i.e., 
as soon as the bandwidth share drops below that needed to meet the required 
delay bounds, the performance falls sharply to zero. Admission control is often 
necessary for this kind of service. 

In MSI, the guaranteed service model consists of two cooperating stakehold- 
ers 0 : A stakeholder providing a basic communication mechanism and the other 
making the refinement into guaranteed services. At one end, the basic service 
could be a dynamically priced best-effort service (like current Internet) with no 
quality guarantees. At the other extreme, no refinement is needed and the basic 
provider delivers all relevant guarantees directly. In between these extremes, the 
basic provider can deliver various combinations of price and/or service guaran- 
tees. It is envisaged that creating two separate economic entities (the basic In- 
ternet service provider (ISP) and the guaranteed service provider (GSP)) would 
be more convenient for the explicit economic modeling of the provisioning of 
service guarantees. 

The idea of providing guaranteed service over a best-effort IP network in- 
frastructure is based on the work of Gibbens and Kelly . The reference model 
for a guaranteed service provider is shown in Figure 0 Here we just give a brief 
introduction of the architecture. For more details (e.g., detailed explanation of 

^ Part of this work was done while the author was on a BT Fellowship at BT Labs, 
working on the MSI Project. 
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risk broker, clearing house, etc.), please refer to 0. The GSP can be viewed as a 
“layer” similar to that in the OSI model. It is assumed that information streams 
go through the GSP on its way from the ISP to the customer (via the interfaces, 
II, 12, etc.). Inside the GSP domain there are two components: clearinghouse 
and risk broker. In an environment where the network infrastructure offers no 
QoS/price guarantees but users nevertheless dynamically adapt to prices or qual- 
ity signals, a risk broker is needed. It buys dynamically priced communication 
services with varying qualities from an ISP, and sells transport services with 
guarantees and simple end-to-end prices to end customers. In the mean time, 
the function of a clearinghouse is to gather charges charged to all end customers 
of some communication, and to redistribute those charges according to some 
agreement among the end customers. 




Fig. 1. The GSP reference model (adapted from [5]) 



There are a range of relations and associated requirements between the ISP 
and the GSP Here we consider only two of them. 



2.1 Scenario 1 

Here the guaranteed service model is similar to that specified in jZ|. For traffic 
conforming to the traffic descriptor and within the agreed maximum duration, 
the GSP guarantees that packets are delivered between endpoints within a given 
time limit with a given probability. There are two-level business interactions. 

Interface 11/12 business interactions. The tariffs for the guaranteed ser- 
vice (charged to the end customer) will be of the general form C = aT + bV + c, 
where T is time and V is information volume. Details of this “abc” charging 
scheme can be found in ISj. 

Interface 13/14 business interactions. The ISP offers to the GSP a best- 
effort datagram service with EGN marking scheme based on congestion pricing 
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The tariff to be used for communication is a charge per ECN mark received. 
Charges apply to the receiver of packets. 

2.2 Scenario 2 

The difference between this scenario and the previous one is that, the service 
offered by the ISP to the GSP is Diffserv-like prioritized transfer of datagrams 
specified by service level agreements (SLAs). Therefore, differences only exist 
in Interfaces 13/14. The sender is free to choose priority level on an individual 
packet basis. SLAs can, however, be dynamically renegotiated. The tariff to be 
used is a charge per volume in each Diffserv class and the charge per volume 
increases with priority. 

The model of Figure ^ is unique in that the GSP is an economic entity 
independent of the ISP. In the following sections, we present detailed analyses of 
the GSP with respect to price interactions, QoS mapping and admission control. 

3 The GSP Charging Model 

There are N customers (applications) in the system indexed by i. Assume the 
revenue the GSP receives from customer i is given by 

a = a{F,,Q,)T, + b{F,,Q,)V, + c ( 1 ) 

where F is a set of traffic parameters and Q is a set of quality of service spec- 
ifications in the traffic contract. T is the duration of the connection and V is 
the volume. The coefficients a, 6, c can be determined from the traffic descriptors 
as described in . Assume that in Scenario 1 the total number of EGN marks 
received for connection i is Gi and the price per mark is Pm set by the ISP. 
Hence the optimisation problem for the GSP is 

maximise GiPm) (2) 

i 

over charges a,b,c, subject to customer’s QoS constraints Qi. 

To achieve QoS guarantees the GSP implements call admission control (GAG). 
Assume that the GSP acts as a GAG gateway by probing the network and ac- 
cept/reject incoming calls based on EGN marks 0. If the marking probability 
P"* is below some threshold A, it accepts a call request. Here A is determined 
by Fi,Qi. For example, if Qi is packet loss probability, then the smaller Qi is, 
the smaller A is. So Gi is also a function of Fi,Qi. On the other hand, smaller 
A means higher blocking probability P^, hence smaller number of connections 
the GSP accepts. 

Given a general knowledge of incoming traffic (Pi and Qi) and ISP’s resources 
(capacity, buffer, etc.), the GSP should set A just small enough to satisfy Qi so 
that it can accept more calls. The conjecture is that the revenue from extra cus- 
tomers would exceed the extra cost (marks) incurrecH. However, if occasionally, 

^ We later show this conjecture may not be correct using a simple example. 
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a customer’s QoS is violated and the customer’s utility decreases, the GSP may 
have to pay a penalty. The worst consequence would be losing customers. There- 
fore, the GSP has to maintain a good balance here by means of careful planning 
and provisioning. This provisioning (among other resource optimisations) should 
happen on a longer time scale. 

3.1 Customer Model 

The customer solves the problem 

maximise U{Fi,Qi) — Ci (3) 

over Fi, Qi. U is customer’s utility function. Here traffic descriptors could be of 
many different kinds, e.g., a descriptor based on the token bucket. 

Although the above GSP model does not seem to be mathematically tractable, 
it can still give us some insights into such issues as price setting, provisioning 
and the interaction between customers, the GSP and ISP. We will discuss these 
issues in the following sections. 

4 GSP Provisioning for ECN Charging (Scenario 1) 

Given a loss probability (QoS) of Pioss and resource capacity of the ISP, the 
GSP can determine the marking probability Pmark under a specific EGN marking 
scheme, which is normally proportional to Pioss - On the basis of this, the GSP can 
determine the threshold of admission control decisions and hence have estimates 
of Pbiock and the maximal loac^. Under the “abc” charging scheme |H|, the per 
unit time charge is given by 



C = a{l,m) + b{l,m)M (4) 

where M is the measured mean rate, I and m are parameters in the effective 
bandwidth formula. 

To remain profitable, the GSP has to ensure that 

C > Pmark M Pm (5) 

where Pm is the price per mark set by the ISP. 

As can be seen from above discussions, if a user has quite stringent QoS 
(say, very small Pioss), naturally the charge components a and b must be high 
due to the property of effective bandwidth. On the other hand, Pmark must 
be small (otherwise the GSP would reject this connection). So it seems quite 
obvious that 0 will be satisfied. However, this is at the expense of more strict 
admission control which may turn away many other connection requests. Thus, 
the GSP’s total revenue ^ - Ci may not be high. 

® It is similar to the case of capacity provisioning using Erlang’s formula in traditional 
telephone networks. 
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To see the problem more clearly, here we consider a simple example, just 
illustrating the trade-off the GSP has to make. Assume homogeneous connections 
and Pmark is increasing exponentially with the number of active connections 
i.e., Pmark{n) ~ where g,h are small positive numbers. Hence, the 

maximum number of connections rimax the GSP can accept with profits satisfies 
the following equation: 

nC = he^'^nMpm (6) 

Since n^ax = - log , Umax increases with C and decreases with g. This 
also indicates tfiat admission control is indeed necessary at the GSP because 
ne®" grows faster than nC as n increases. Note that here admission control is 
not only an engineering decision (to provide satisfactory QoS to users), but also 
an economic one (to make profit for the GSP). Alternatively, the GSP may want 
to maximise its net income by solving: 

max nC — nhe^'^M Pm (7) 

n 

It is easy to show that the optimal n* satisfies the following equation: 

C - he<^^Mpm - ngheS'^Mpm = 0 (8) 

Note that n* < Umax, as shown in Figure |21 This can be proved as follows. 
From ( 0 , we have C = hexp{gn*)Mpm + n* ghexp{gn*)Mpm- Then we have 
Umax = + g log(n* 5 -|- 1) > n*. If the GSP wants to have positive net revenue, 

it should not accept more than Umax calls. On the other hand, it may only accept 
n* calls to make maximum profit. However, this will result in higher blocking 
rate and its impact on the business relations between the GSP and its customers 
is an open question. 



Price 




Fig. 2. GSP admission control and charging 



^ This is a rough assumption derived from the simulation results of 0|. 
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5 GSP Provisioning for the Diffserv Scenario (Scenario 2) 

In Diffserv networks, packets are classified into a small number of aggregated 
flows or “classes”, based on the Diffserv codepoint (DSCP) in the packet’s IP 
header. This is known as behaviour aggregate (BA) classification. At each Diff- 
serv router, packets are subjected to a “per-hop behaviour” (PHB), which corre- 
sponds to the DSCP. The PHB of a behaviour aggregate is distinguished by the 
DSCP and/or the source/destination address, and source/destination port num- 
ber. PHBs are implemented at routers by some buffer management and packet 
scheduling mechanisms. The primary benefit of Diffserv is its scalability, because 
Diffserv eliminates the need for per-fiow state and per-fiow processing. 

Assume the ISP implements Diffserv with priorities. There are N different 
classes, with Class 1 has the highest priority. Each priority i is linked with a price 
per volume, pi, with pi > p2 > ... > Pn- The ISP advertises to the GSP a set 
of typical QoS specifications associated with each class (e.g., expected capacity 
pn|). Then the GSP makes a choice for the priority (and hence forming a service 
level agreement) on the basis of several factors: customer’s QoS requirements, 
traffic characteristics, ISP charges, and GSP’s revenue from the customer. 

There exists the issue of service mapping. Guaranteed service requests could 
specify an Intserv service type and a set of quantitative parameters known as 
a “fiowspec”. Requests for guaranteed services must be mapped onto the un- 
derlying capabilities of the Diffserv network. Aspects of the mapping include 

m-- 

— selecting an appropriate PHB, or set of PHBs, for the requested service; 

— performing appropriate policing (including shaping or remarking) at the 
edges of the Diffserv region; 

— exporting Intserv parameters from the Diffserv region; 

— performing admission control on the Intserv requests that takes into account 
the resource availability in the Diffserv region. 

There is some standard, well-known mapping from Intserv service type to a 
DSGP that will invoke the appropriate behaviour in the Diffserv network. 

Based on this service mapping and the pricing structure of the ISP, the GSP 
maintains a table (static or dynamic) including the following information: host’s 
IP address and port number, QoS parameters and traffic characteristics, the 
minimum priority needed to satisfy QoS, ISP prices for this priority, and the 
revenue that the GSP gets from the customer. It can be viewed as a contract 
between the GSP and the ISP. For a static contract, the general structure (con- 
tents) of this database may be varied over a long time scale (say, monthly). The 
GSP can choose the term of the contract, i.e., either long-term fixed contract or 
short-term flexible contract. In the latter case, the GSP can increase or decrease 
priorities dynamically. For this kind of dynamic SLA, the GSP functions like a 
bandwidth broker with a signalling protocol such as RSVP to request for services 
on demand. For instance, a user normally has a static contract with the GSP 
for ftp and email applications. The GSP has assigned a low Diffserv priority for 
this contract, say. Glass 3. Now the user wants to use a real-time video service 
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for a short period and apparently Class 3 is insufficient. So the GSP negotiates 
with the ISP for a temporary higher priority (say, Class 1) and charges the user 
more. The pricing policy for this kind of ad-hoc service upgrade may be different 
from the default pricing structure. 

To support multiple levels of service, the network provider posts a set of 
different prices, pi,i = 1,2, ...,A^. The number of priorities may depend on the 
population of customers and the desired granularity of resource unit. The ISP 
also maintains a volume meter for each class for accounting purposes. Similar 
to telephone networks, the provider could set different prices for these priorities 
at different times of day, e.g., peak time, evening, weekend, etc. In this case, 
^weekend ^ ^evening ^ However, as network traffic is much more bursty 

and unpredictable than traditional telephone traffic, factors other than time or 
distance have to be taken into account as well. The detailed pricing policies that 
the ISP uses may vary from flat rate to usage or congestion-based, or a combi- 
nation of the above. The ISP also contains a table that indicates the transmit 
capacity provisioned at each Diffserv service level. This table, in conjunction 
with the service mapping, is used to perform admission control decisions. 

In the above scenario, the actual communication between hosts, the GSP, 
and the ISP to obtain end-to-end QoS could be based on RSVP. 

5.1 Example Service 

To support guaranteed delivery or bounded delays in a Diffserv network, it seems 
that the expedited forwarding (EF) PHB is a natural choice. The EF PHB 
can be viewed as a virtual leased line service and is implemented as a priority 
queue that is serviced before all other queues. A number of traffic engineer- 
ing mechanisms are needed for EF. First, the Diffserv network must be over- 
provisioned with respect to the EF traffic that it admits. Secondly, admission 
control must be performed at the edges of the network. Thirdly, edge routers 
must shape all EF traffic so that their negotiated peak rate is never exceeded. 
Experiments have shown that EF can provide a low-loss, low-delay and low-jitter 
end-to-end service. 

In this case, the GSP’s role is simpler compared to the case of ECN. It is 
mainly responsible for service mapping. The edge router of the ISP performs 
CAC. If there is not enough capacity for a new EF request, the GSP can either 
ask the user to delay his request or downgrade the request to a lower priority 
(say, best-effort). Since the EF traffic is limited and shaped to a contracted 
peak rate, it is relatively easy for the ISP to set price. It is also possible for the 
GSP to work out its own budget and pricing policies. EF traffic would normally 
be allocated a small percentage of the total network capacity and priced much 
higher. The GSP can purchase this amount of bandwidth from the ISP at a 
wholesale price. It will then set (higher) charges a, 5, c to sell this EF capacity 
to multiple customers and make profit. 



Pricing and Provisioning for Guaranteed Internet Services 



63 



6 Simulation Experiments 

A simulator for guaranteed service provisioning is currently under development 
within MSI. In this section, we discuss briefly several design considerations of 
this simulator. 

In our simulator, The GSP contains a QoS agent that keeps track of the trans- 
mit capacity currently available in the router, as well as users’ QoS requirements. 
This information is used to perform admission control. The QoS agent also mon- 
itors various QoS metrics and is responsible for QoS re-negotiation. The GSP 
also has a pricing agent which is responsible for metering, accounting, pricing 
and charging. Here our main focus is on pricing, which involves setting prices 
(e.g., a,b,c) both dynamically and statically. As discussed before, the pricing 
agent also interacts with the QoS agent. 

At the user side, users try to maximise their net utilities by carefully selecting 
QoS and traffic parameters. An intelligent agent may be present to estimate 
user’s utility. In the mean time, a user may change his contract with the GSP 
depending on how he values the quality he receives. The open question is how 
to determine utility functions. 

At the ISP, it may implement EGN or Diffserv. In the EGN scenario, we 
could consider the virtual queue marking method p] or the simpler threshold- 
based marking scheme. The ISP charges the GSP a flat rate fee plus congestion 
costs based on EGN marks. In the Diffserv scenario, the router maintains a set of 
queues where each queue represents a behaviour aggregate. Each BA provides a 
specific QoS. The router will add an arriving packet to a proper BA according to 
the DSGP (which is determined by the GSP using service mapping). The router 
meters the traffic in each class and charges by traffic volume accordingly. 



7 Conclusion 

This paper describes an architecture of provisioning guaranteed Internet services. 
Unlike existing QoS mechanisms such as Intserv and Diffserv, this architecture 
enables Internet resource management through market forces, e.g., pricing. We 
have discussed the GSP model and presented a detailed analysis of two scenar- 
ios of the GSP provisioning: EGN capable Internet and Diffserv Internet. We 
concentrate on the pricing model, its implications for traffic management (e.g., 
admission control), service quality contract, and possible example services. A 
simulation framework is also discussed. 
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Abstract. For data delivery systems such as video-on-demand service, 
an optimum design is presented to maximize the revenue of the system 
with priority classes. The willingness-to-pay (WTP) is introduced for a 
measure of utility (price), and the optimum design is discussed to maxi- 
mize the revenue. For the system with two priority classes, the optimum 
condition is given in terms of the traffic load, waiting time for service 
and pricing for the priority and non-priority classes. In this paper, we 
use the WTP as the measure of pricing. And we would like to examine 
the optimum design to propose the optimum pricing methods for time- 
base and flat-base. In this paper, using the WTP, the utility (price) of 
the data delivery system has been quantihed, and the optimum condi- 
tion to maximize the revenue of the system has been analyzed. From the 
numerical examples, the optimum condition for the service grade (wait- 
ing time for service) and pricing for priority and non-priority classes has 
been discussed. 



1 Introduction 

For data delivery systems such as video-on-demand service, an optimum design 
is presented to maximize the revenue of the system with priority classes. The 
willingness-to-pay (WTP) is introduced for a measure of utility (price), and 
the optimum design is discussed to maximize the revenue. For the system with 
two priority classes, the optimum condition is given in terms of the traffic load, 
waiting time for service and pricing for the priority and non-priority classes. 

Section 2 describes the traffic model to evaluate the waiting time for service, 
Section 3 introduces an approximate function for the willingness-to-pay, and 
Sections 4 and 5 describe the optimum design to propose the optimum pricing 
methods for time-base and flat-base. Section 6 summarizes the results obtained 
and gives concluding remarks. 
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Up link 




i : Priority class ( 1 . 2 , ■ Q) 
X ■ '■ Data arrival rate 



Nq 

JVi : Number of users 



Fig. 1. Preemptive priority traffic model 



2 Traffic Model 

In this paper, we consider a preemptive priority model. FigUl shows the model 
with Q priority classes. Suppose that higher priority class data preempt lower 
priority class data in service if any, and the preempted remaining data are trans- 
mitted afterward. This model is called preemptive resume model. 

Assume that requests of the data delivery occur at random (Poisson dis- 
tribution) and that sufficiently large (infinite) buffers are provided. Using the 
preemptive priority resume model, we investigate the waiting time to start the 
service for data delivery service. If the data are transmitted immediately upon 
the request, the requests rate is equivalent to the data arrival rate in FigOl 

The mean waiting time Wi for priority class i to start the service is given by 
P P-82] 









( 2 ) 



i-1 



( 1 ) 



2 1 

i=i / V i=i 



where Xj is the data arrival rate (request rate), hj the mean net data transmis- 
sion time (without interruption) and hj the second moment of the net data 
transmission time of priority class j, and Pj = Xjhj. 

If the data transmission time is exponentially distributed, the net trans- 
mission time for the remaining data follows the same exponential distribution 
according to the Markov property PJ p.7]. In this case, we have the relation 

/if) = 2/1/. 



Price Optimization of Contents Delivery Systems with Priority 67 



Although in actual systems, data may be packetized, the packet level opera- 
tion is neglected in this paper, and data are assumed to be transmitted contin- 
uously unless interrupted. 

3 Measure of Utility and Pricing 

Let us introduce “Willingness-to-Pay: WTP” as a measure of utility |2| for service 
The WTP is one of the methods of social sciences, and means a price of 
“How much willing to pay” for a certain service jS||S|- The WTP is used to 
estimate the utility of service. In this paper, we use the WTP as the measure of 
pricing. 

Denoting by U the WTP (price) and by W the mean waiting time until the 
service is available, it is assumed that the rate of increase of WTP, dU/U, is 
proportional to the rate of decrease of the mean waiting time, —dWjW. 

Then, we have the relation. 



dU , dW 

IT ~ ~^~w 

where /c is a proportional coefficient. By integration. 




we have 



log [7 = C — fc log W 

where C is an integration constant. Setting D = exp (7, we have 

U = DW~^. 



(2) 



( 3 ) 

( 4 ) 



The parameter k may be statistically estimated by opinion tests while 

D may be determined to balance the revenue and the system cost. The parameter 
k shows the construction of user’s decision making . Generally, if the parameter 
k is nearly 1, user’s utility (service satisfaction) depends heavily on the waiting 
time of data. On the other hand, the value of k is nearly 0, waiting time has 
no (or little) influence to user’s utility. It is assumed here that the value of k is 
independent of D in this paper. This means that only the relative value (ratio) 
of the WTP (price) is relevant. 



4 Proposed Pricing Methods 

4.1 Time-Base Pricing Method 

Let us consider a simple preemptive priority model with two classes, the priority 
class and the non-priority class. Assuming that the data volume to be delivered 
is randomly varying with mean H [Mbyte] same for the two classes, the net 
data transmission time in the network with bitrate c [Mbps] is approximately 
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exponentially distributed with mean h = H x 8/c [sec], and the second moment 
h(2) = 2^2 . 

Letting the data arrival rates of the priority and non-priority classes be Ai 
and A 2 , respectively, the traffic loads in Erlang (occupancy) of the respective 
classes are given by 



Pi — Alh, p2 — A 2 / 1 . 


(5) 


Denote the mean waiting times until the first part of the data is received 
from the service request, for the priority class and the non-priority class by W\ 
and W 2 , respectively. Then, from (U have 




(6) 


where 




p= Pi + P2 


(7) 


is the total traffic load (occupancy) of the network. 

Applying 0), the WTP’s Ui and U 2 for the priority and non- 
respectively, are approximated by 


priority classes. 


Ui = U 2 = DW2~’^. 


(8) 


In the time-base pricing system, set U\ and U 2 as the priority and non-priority 
class prices per unit time, respectively. For the priority class, the mean revenue 
per request is given by 

(9) 

c 


which is proportional to the volume H of the data transmitted. Since Ai is the 
number of requests per unit time for the priority class, its revenue per unit time 
is given by 


XiUih — piUi- 


(10) 


In a similar way, the revenue of non-priority class per unit time 


is given by 


\ 2 U 2 h = P2U2- 


(11) 


Hence, we have total revenue per unit time. 




R = p\Ui + P2U2- 


(12) 


From the relation 

^ = 0 ^ = 0 
dp ’ dpi 


(13) 
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the optimum condition to maximize R in (ini) is given by (See Appendix 1.) 



where 



1 _ k 

P-Pi P(l-P) 
/ 1-fc _ k \ 

V Pi 1-Pi/ 



/i 



p- Pi I- Pi 



/2 



/i — piUi, 



/2 = P2U2- 



(14) 



(15) 



4.2 Flat-Base Pricing Method 



Denote the number of users of the priority and non-priority classes by 7Vi and 
N 2 , respectively, and the request rates (number of requests per unit time) per 
user of the respective classes by cti and (T 2 • Assume that cti and 02 are constant 
values not relevant to the waiting time and price in the flat-base pricing. The 
traffic loads in Erlang (occupancy) of the respective classes, pi and p 2 are given 
by 

Pi = Nidih, p2 = N2(T2h. (16) 

Setting priority and non-priority flat-base prices {e.g. per month) correspond- 
ing to Ui and U 2 , respectively, the revenue R is given by 



R = NiUi + N 2 U 2 . 



If the total number of users N is defined as 



N = Ni+N2 



the condition to maximize R is given by 



dR 



= 0 , 



dR 

mi 



= 0. 



(17) 



(18) 



(19) 



Using the same notation as in the time-base method, from (TTTIIi we have the 
optimum condition, (See Appendix 2.) 

1 _ 1 
p-pi p(l-p) 

1 k 

1" 

p- Pi I- Pi 




1 - k 
Pi 



I- Pi 



h = 



where r = (Ti/(T 2 (constancy value). In the case of r = 1, (fZi )» is equivalent to 
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5 Example Calculations 

5.1 Time-Base Pricing Method 

Figj^lshows the graph of R/D for k = 0.5. In the case of fc = 0.5, from 11411 . we 
have the optimum condition to maximize R as fellows: 

The traffic load p = 0.633 erl (pi = 0.168 erl and p 2 = 0.465 erl) gives the 
maximum revenue R/D = 0.697. 

The ratios of mean waiting time and price (WTP) are: 

Wi 0.202 1.00 Ui 2.23 3.19 

W 2 ~ 2.07 “ iM’ ~ 0.695 “ TOO' 



A = 0.5 




Fig. 2. Graph of R/D 



For example, with the data of mean H = 10Mbyte and the network bitrate 
c = 8 Mbps, we have h = H x 8/c = 10 sec. Hence, the mean waiting time 
for starting service are Wi = 2.02 sec and W 2 = 20.7 sec, respectively, for the 
priority and non-priority classes. If the price for the non-priority class is set at 
100 Yen per 10 Mbyte data, the corresponding price for the priority class becomes 
319 Yen, according to the ratio of the price (WTP). 

5.2 Flat-Base Pricing Method 

In the case of fc = 0.5, Table ^ shows the optimum condition for r = 0.5, 1.0 
and 2.0 calculated from The case of r = 1.0 is the same as in the time-base 
pricing. 
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Table 1. Optimum Condition 



r 


p 


Pi 


R/D 


Wi 


W 2 


f/i 


U 2 


0.5 


0.7380 


0.3513 


0.5702 


0.5416 


4.3427 


1.3588 


0.4799 


0.1 


0.6325 


0.1677 


0.6968 


0.2015 


2.0683 


2.2279 


0.6953 


2.0 


0.5391 


0.0421 


1.1003 


0.0440 


1.2209 


4.7688 


0.9050 



(1) Case of r = 0.5: Priority class request is a half of the non-priority request. 
The ratios of mean waiting time and price (WTP) are: 



Wi 0.542 1.00 C/i 1.36 2.83 

W 2 ~ 4.34 “ U 2 ~ 0.480 “ POO' 

When h = 10 sec and CTi = 1.0/hr, then 172 = 2.0/hr, the optimum numbers 
of users are given by 

iVi = -^ = 0.351 X = 126.5 ^ 126 

cTi/i 1.0 X 10 

JV 2 = = 0.387 X = 69.60 ^ 69. 

CT2ft 2.0 X 10 

(2) Case of r = 2.0: Priority class request is twice of the non-priority request. 
The ratios of mean waiting time and price (WTP) are: 



Wi _ 0.044 _ 1.00 Ui _ 4.77 _ 5.27 
W 2 ~ 1.22 “ U 2 ~ 0.901 “ ITO 



When h = 10 sec and ai = 1.0/hr, then (T 2 = 0.5/hr, the optimum numbers 
of users are given by 



Ni 



-PP = 0.0421 X 
a\h 



3,600 
1.0 X 10 



= 15.16 ^ 15 



N 2 = - — ^ = 0.497 X 
(J2h 



3,600 
0.5 X 10 



= 357.8 ^ 357. 



Figs0 and 0| show the effect of the value of k and r. Fig0 indicates the 
optimum ratio, Wi/W 2 , of waiting time for service for priority and non-priority 
classes, which has a hump around k = 0.7. Fig0 shows the optimum ratio of 
price (WTP), U 1 /U 2 ■ 



6 Conclusions 

In this paper, using the WTP (Willingness-to-Pay), the utility (price) of the data 
delivery system has been quantified, and the optimum condition to maximize 
the revenue of the system has been analyzed. From the numerical examples, the 
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Parameter : k 

Fig. 3. Optimum ratios of waiting time 




Parameter : k 

Fig. 4. Optimum ratios of price (WTP) 



optimum condition for the service grade (waiting time for service) and pricing 
for priority and non-priority classes has been discussed. 

The assumptions and parameters for the WTP formula (0) need to be verified 
statistically. As mentioned in Section 3, it is assumed that the value of k is not 
relevant to the value of D in the utility measure function. This point should 
be clarified by the opinion tests, etc. Even if more sophisticated formulas for 
the WTP are used, the framework of this paper may be applied in a similar 
manner. Although the request rates per user are assumed constant in the flat- 
base pricing, they might be influenced by the price and waiting time, for which 
model the analyses are also under study. 

A fixed value of network bitrate c and exponential service time are assumed 
in dOD for estimating the mean waiting time for service. The assumption may be 
applicable for a network with the bandwidth reservation scheme such as the CBR 
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(constant bit rate) in the ATM (asynchronous transfer mode), and Diffserve in 
the Internet. Even for the case of the bitrate varying randomly with mean c, the 
results of this paper may be applicable as far as the data transmission time is 
approximately exponentially distributed. 
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Appendix 1: Derivation of Equation (1 1 4i) 

From the second equation in ED, 




(21) 



Since /i = piUi = D{1 — pi)^ pi^ taking the logarithm we have 



log/i = logU -b A:log(l - pi) -b (1 - fc)logpi. 
Differentiating (12211 w.r.t. (with respect to) p\ yields 



( 22 ) 




(23) 



In a similar way, 

log /2 = logD -b A:log(l - pi) + A:log(l - p) 
-blog(p- pi) - klogp, 



(24) 



Differentiating (I24|l w.r.t p 2 , we have 




1 



1 - Pi p - Pi 



(25) 
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Substituting m and in the second equation in follows. 
Noting that dfi /dp = 0 since Wi in (0 includes no p, from the first equation 
in PI) we obtain 

1 k 



— f 2-^ log J2 — J2 
op Op 



= 0 . 



(26) 



,P-Pi P(l-P), 

Assuming /2 0 we get the first equation in (ED 

For numerical computation, solving the first equation in da w.r.t. p > 0, 
we have 

1 - fc + -y/( 1 - fc)2 + 4fcpi 



P = 



(27) 



Using (ED in the second equation in (ED, we can compute the optimum solutions 
for p and pi by iteration. 

Appendix 2: Derivation of Equation (1201) 

From m and (EZI), the revenue R is expressed as 



Define 



where fi = piUi, f 2 = P 2 U 2 and r = a\/ 02 - Then, using in (ED, we have 

/ 



= —Ui + —U 2 . 


(28) 


CTi 02 


/ = A + rh 


(29) 



i? = 



0-1 



(30) 



Applying the differentiation formula for a function of a function, the optimum 
condition (P|) becomes 



dR _dR dp _ dR _ dR dpi _ 
dN ~ ~ ’ dNi~l^idNi~ ■ 

Noting that the total occupancy p is expressed as 

p = Pi + P2 = Niai + {N - Ni)a2 = iVi(o'i - (T 2 ) + Na2, 
and CTi and CT 2 are constant values, we have 

dR _ 1 df dR _ 1 df 

dp U2 dp ’ dpi ai dpi 



dp 



= 0-2, 



dpi 



= <Ol- 



(31) 

(32) 

(33) 

(34) 



dN dNi 

Using and (ED in (ED I we have 

dl^dfi 

dp dp 

In a similar manner as in Appendix 1, from (1,4.41) the optimum condition (121 )ll is 
derived for the flat-base pricing method. 



= 0 , 



dl_dfi df2_. 

dpi dp ^ dpi 



(35) 
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Abstract. The era of classical PBX-based call centers has passed. Those 
systems were proprietary and closed, i.e. with fixed functionality. Today, the 
Internet and multimedia applications are becoming more and more popular 
across the world, and there is a lot of effort in both academia and industry to 
build and deploy modem Internet-based call centers. This paper should be 
viewed as a contribution to these efforts. It presents our approach to Internet- 
based virtual call center implementation. In contrast to other efforts, we 
consider the virtual call center as a universal infrastmcture, which could be used 
as a telecommunication management network center and as an intelligent 
network service control point, too. In the paper we present our concept, the 
most interesting implementation details, and pilot network configuration. 



1. Introduction 

The paper presents an approach to Internet-based Virtual Call Center (VCC) 
implementation. The approach is based on the experience gained from research and 
development of classical call centers [1-4] and similar systems for Russian 
telecommunication network [5-10]. The main lessons learned from this effort are the 
following: 

• End customers as well as software developers fully depend on hardware 
equipment manufacturer. 

• A classical call center consists of two loosely coupled parts: PBX-based (Private 
Branch Exchange) ACD (Automatic Call Distributor) and LAN (Local Area 
Network), which operate almost independently. 

• A classical call center serves PSTN (Public Switched Telephone Network) users 
only. There is no possibility to access the call center from other networks. 

• The quality of subscriber-agent communication is pure because it is based on 
voice only. 

• Proprietary ACD is a closed system, and it is hard to extend its functionality 
and/or capacity. 

In order to overcome these problems, the call center must be Internet/Intranet 
based, and it has to be decentralized, i.e. virtual. This Virtual Call Center (VCC) 
should provide services to all kinds of users, including telephone network subscribers 
and Internet subscribers. 
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The core of the Virtual Call Center is the Intranet, which is based on TCP/IP 
technology. Physical infrastructure could be Ethernet for smaller VCCs, and FDDI 
(Fiber Distributed Data Interconnect) or ATM (Asynchronous Transfer Mode) for 
bigger VCCs. The VCC Intranet is connected to the global Internet through the 
Internet firewall, and to PSTN trough PSTN gateway. 

In our approach, the VCC is the infrastructure, which could be used as TMN 
(Telecommunication Management Network) center, and/or SCP (Service Control 
Point) of IN (Intelligent Network). We plan to keep the system open so that new IN 
services (not yet standardized by the ITU-T) could be introduced in the future, too. 

The text of the paper is organized as follows. Next subsection briefly cites the 
related work relevant to the paper. Section 2 presents our concept of the Internet- 
based virtual call center. In Section 3 we describe the main implementation details. In 
Section 4 we describe the pilot network, which is used for the experimental evaluation 
of our approach. Section 5 contains the conclusions. At the end of the paper we give 
the list of the references cited throughout the paper. 



1.1 Related Work 

Our approach is one of the efforts to improve existing call centers, which are 
undertaken in both academia [10-15] and industry. For example. The Cisco Systems 
is promoting its Architecture for Voice, Video and Integrated Data (AVVID) [16]. 
The 3Com have classified the software-based call centers into Virtual Call Centers, 
Web-enabled Call Centers, and Multimedia Call Centers [17]. The Dialogic advocates 
Internet-enabled Call Centers [18]. However, in our vision the VCC will be used for 
servicing both users/subscribers and other networks. For example, we plan to use a 
VCC as a platform for TMN, too. Additionally, VCC could add new functionality to 
the existing networks. For example, VCC will provide IN platform for the existing 
PSTN. 



2. Internet-Based Virtual Call Center Concept 

A classical call center, which consists of PBX (Private Branch eXchange) and 
computer workstations and servers connected to the LAN (Local Area Network), is 
becoming obsolete today. The main reason for this situation is that the classical call 
center is a proprietary closed system with fixed functionality. The quality of voice- 
based communication between a customer and an agent is poor. In order to eliminate 
these disadvantages, a modern call center must be Internet/intranet based, and it has to 
be decentralized, i.e. virtual. Such virtual call center (VCC) should provide services to 
all kinds of users, including the following user categories: 

• Analog subscribers. The call is made from the plain old telephone, connected to the 
PSTN. 

• Digital subscribers. The call is made from the ISDN (Integrated Services Digital 
Network) terminal, which is connected to the PSTN/GSTN (Global Switched 
Telephone Network) through BRI/PRl (Basic Rate Interface / Primary Rate 
Interface). 
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• Internet subscribers using a browser to access the call center. 

• Internet/Intranet subscribers using H.323 terminal to access the services offered by 
the call center. 

• Internet E-Commerce Transaction users/subscribers and other database oriented 
services. 

The structure of the VCC is shown in Figure 1. The core of Virtual Call Center is 
an Intranet, which is based on TCP/IP technology. Physical infrastructure could be 
Ethernet for smaller VCCs, and FDDI (Fiber Distributed Data Interconnect) or ATM 
(Asynchronous Transfer Mode) for bigger VCCs. 

The AOM (Administration, Operation & Maintenance) functions should be 
provided through SNMP (Simple Network Management Protocol), which is the most 
mature technology comparing to other similar technologies. However, VCC could 
offer TMN (Telecommunication Management Network) services to PSTN operator 
through SS7 (Signaling System number 7) platform provided by the VCC. 

VCC services will be provided by different kinds of Agents, including the 
following categories: 

• Simple telephone-call agent. This agent has just a plain old telephone, and he/she 
just answers the telephone calls distributed by the CTI Server. 

• Call Agent with database support. This agent uses H.323 terminal (a PC with a 
LAN card, sound blaster, and hands-free headset), and he/she can issue database 
queries while servicing a user/subscriber. H.323 conferencing with a 
user/subscriber and/or other agent(s) is possible, too. 

• Call & E-mail Agent. This agent uses H.323 terminal to answer voice calls and to 
replay user/subscriber e-mail messages (which could include text, audio and 
video). 

• Other kinds of Agents could be added later. For example, H.323 remote agent 
working at home, remote agent with the Bluetooth headset, etc. 

VCC services are provided using different kinds of servers, including the following 
categories: 

• DNS (Domain Name Server) & Proxy server. 

• Mail server. 

• WWW (World Wide Web) server. 

• H.323 Gatekeeper (H.323 address translation and administration). 

• H.323 MCU used for H.323 multimedia conferencing. 

• AOM (Administration, Operation & Maintenance) server. 

• CTI (Computer Telephony Integration) Server. It distributes calls, maintains the 
queues for individual services/agents, and supports CTI services inside the VCC. 
There could be more than one CTI server in the system, which operate in the load- 
sharing working mode. 

• IVR (Interactive Voice Response) Servers. 

• Application server(s), for different kinds of tasks. 

• Database server(s). 

The VCC is connected to Internet through the Bastion host based Firewall. In 
future we plan to introduce VPN (Virtual Private Networking) switch, too. 

VCC is connected to PSTN through different kinds of Gateways, including the 
following categories: 
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• Analog VoIP Gateway. This gateway is connected to the PSTN through the series 
of plain old analog-subscriber lines. It is seen by PSTN as a virtual PBX. The lines 
are so-called PBX series lines. This means that LE (Local Exchange) will place a 
new call to the PBX on the first available line from this series. This is a traditional 
way of connecting a PBX to PSTN and it is the most effective solution for the 
small VCCs. Analog Gateway is a PC with a LAN card, and a series of 
commercial-of-the-shelf modems and/or media cards. The voice transmission over 
Intranet should be with minimal compression in order to provide high-quality voice 
connections. Eor example, G.711 could be used for calls coming from PSTN. 
Additionally the Intranet should use traffic classification (Layer 3 schemes such as 
IP precedence or use of the Different Service Code Point, Layer 2 schemes such as 
802. IP and also use of Real-Time Protocol) and traffic prioritization technology, in 
order to guarantee an acceptable QoS for user/subscriber-agent voice connections. 

• Digital VoIP Gateway. This gateway is connected to PSTN through El/DSSl, 
interface (30B-tD/DSSl Terminal side). The gateway is a PC with a LAN card and 
an El/HDLC/Speech card. The card could be commercial-of-the-shelf or 
proprietary. 

• IN (Intelligent Network) Gateway. This gateway connects to TE (Transit 
Exchange) in PSTN over 2 or more (in future) E1/SS7 interfaces. The gateway is a 
PC with a LAN card and El/HDLC/Speeh SS7 card(s). 

• Some other kinds of Gateways could be added later. 

The Gateways in our terminology are also known as “front-end” computers. The 
solution is highly scalable. Eor small VCCs one front-end computer suffices. For 
larger ones, for front-ends could be added. This solution takes care about reliability 
issues, too. For reliability purposes, backup (reserve) front-ends could be added. 

In case when more than one PSTN gateway exists in the system, they operate in the 
load-sharing working mode. For the incoming calls to VCC, PSTN will view the set 
of PSTN gateways as a group of trunks, and it will allocate the trunk (channel) to be 
used for the connection. On the other hand, in the case of the outgoing call from VCC, 
CTI server that handles the call will offer the call to all available PSTN gateways. The 
PSTN gateway, which replays first, will continue to handle the call. The requests to 
other PSTN gateways will be postponed. 

Obviously for a system such as VCC the security of the system is the crucial issue. 
Information contained in, and the services offered by VCC must be protected. 
Resistance to attacks from PSTN is provisioned trough correct implementation of the 
applied signaling system (for example DSSl). Database inquires/transactions are 
secured by PINs (Person Identification Numbers). The attacks from Internet are 
prevented by firewall, and VPN support. The providers using the same VCC 
infrastructure are separated by the VLANs (Virtual LANs) support. 

Another important issue is the extendibility of the system. The system is designed 
in such a way that the introduction of new services doesn’t affect the agent call 
handling, i.e. how he accepts, transfers, or releases the call. Furthermore, the system 
supports CCR (Customer Controlled Routing) functionality through the CCR scripts. 
The CCR support enables call-processing software reusability. 
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Of course, if a new service is introduced the corresponding application and 
database server programs must be deployed. To make it easier to introduce a new 
service we are using component-based software design and Commercial-of-the-Shelf 
(COTS) components. Additionally, instead of simple TCP/IP client-server 
architectural model, we use CORBA-based three-tire client — application-server — 
database- server architectural model. 

Let us summarize this section of the paper. VCC can distribute calls coming from 
users of PSTN and/or Internet to the agents. Moreover, certain computers in VCC 
could act as a SCP, which is connected, to one or more SSPs (Service Switching 
Points). These dedicated computers automatically perform the SCP functions. Of 
course, VCC personal does SCP administration, operation and maintenance. Finally, 
VCC could be used as TMN center, too. In that case VCC agents act as TMN 
operators, who supervise and control the telecommunication network. 



3. Internet-Based Virtual Call Center Implementation 

As the most interesting implementation details we have selected the following: 

• Local voice call (i.e. virtual PBX call) processing. 

• PSTN inbound (incoming) call processing. 

• Internet inbound call processing. 

The components involved in a local call processing are the following: 

• According to ITU-T Q.7I: FEl, FE2, EE4, and EES 

• AB analysis (ABA) component. This component analyses calling and called party 
numbers. 

• IVR (Interactive Voice Response) component. This component plays RANs 
(Recorded Announcements) to a user/subscriber, and accepts his selections. 

• Call Distributor (CD), Service working Group (SG), and AGENT components. 
Static component relations are shown in Figure 2. The standard Q.71 FE1-FE2- 

FE4-FE5 structure has been extended with ABA for B analysis, CD, SG, and AG for 
ACD functionality and IVR for interactive voice response functionality. 

Dynamic component relations are shown in Figures 3, in the form of MSC 
(Message Sequence Charts). Figure 3 shows the telephony-related component 
interaction. 

Standard Q.71 MSC has been customized for the ACD (Automatic Call 
Distribution) functionality. The message inserts are as follows: 

• CD signals a new call arrival to SG with the “NEW_CALL” message. 

• SG accepts a call with the “SG_CALL_ACCEPTED” message. 

• A welcome message and the music is played with the sequence of “PLAY” 
messages. 

• SG offers a call to an agent with the “CALL_OFFER” message. 

• AG accepts a call with the “CALL_ACCEPTED” message. 

• SG signals that an agent has accepted the call with the “OPERATOR_READY” 
message. 
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Fig. 2. The static component relations for a local call processing 



Rp*(] 



ret) ind 



SETUP req ind^ 






PROCEEDING 



req ind 



REPORT req ind 



SETUP resp cnf 



SETUP req in<^ 



SETUP red ind 



SETUP resd cnf 





NEV 


CALL 






SG_CALL 


_ACCEPTE 






PLA 


{ (WELCOP 


IE) REQ 


P 


:ay accei 


•TED 




PLAY 






PLAY EN 


D 


OPERATi 


)R_READ1 




s 


ETUP req in< 




SETUP 


R] 


•PORT req ir 


d 


req ind " 
REPORT 


" req ind 
(ALERTING) 

SETUP 


S] 


:TUPresp ct 


f 


" resp cnf 









CALL_OFFER 






ALL_ACCEPTED I 



Fig. 3. The telephony-related MSC for a local call processing 

The most important aspect of the Internet-related component interaction is the 
usage of IP multicasting. IP multicasting is used for two purposes. The first one is for 
playing an announcement and/or music to the calling party, as follows: 

• “PLAY_REQ” (play request) message is sent, using IP multicasting, to all active 
IVRs, which are registered in the corresponding multicast group. It should be 
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noticed that at least one IVR must be active, but there is a possibility to have more 
of them in the system. 

• IVRs replay with the “PLAY ACCEPTED” message, using IP unicast. 

• The IVR, which replied first, is going to service a call. It will receive the “PLAY” 
message. Other IVRs will receive the “DISCONNECT” message. 

The second usage of IP multicasting is the following. When a call is released, SG 
that is involved in its processing should advertise the availability of the agent, which 
serviced the call. 

One more aspect of the Internet-related component interaction has to be clarified. 
That aspect addresses information (voice) connection establishing and release (open 
and close). This phase of call processing is handled by H.245 protocol. When a new 
call arrives into the call center, it comes to CTI server. CTI server together with IVR 
and selected H.323 agent will process the signaling phase of the call up to the point in 
which the CTI server sends “REPORT” message to the H.323 client (i.e. new call 
from PSTN). After that point H.323 client and agent should solely open information 
channel using H.245 protocol. Involvement of neither CTI server nor IVR is needed 
in order to establish the information channel between the PSTN subscriber and call 
center agent. 

The components involved in a inbound call processing are the following: 

• Q.7LEE6, FE4, andPES 

• AB analysis (ABA) 

• IVR 

• CD, SG, and AG. 

Static component relations are shown in Figure 4. Standard Q.71 FE6-FE4-FE5 
structure has been extended with ABA for B analysis, CD, SG, and AG for ACD 
functionality and IVR for interactive voice recognition. 




Fig. 4. The static component relations for a PSTN inbound call processing 

Dynamic component relations are very similar to those already shown in Figure 3: 
FE6 plays a role similar to FE2 in local call processing. The difference is the 
additional interaction of FE6 with the PSTN gateway. 

In addition to calls coming from the PSTN, the VCC supports the calls coming 
from Internet. A user using www browser (“Internet Explorer”, “Netscape 
Navigator”, or some other) loads the VCC home “.html” page form the VCC www 
server. This page loads another pages and associated applets in response to user 
selections. The applets communicate with application server’s CORBA (lAGUAR) 
components, which in turn communicate with a database server. This is a so-called 3- 
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tyre system. A special VCC client does the database administration. A user may 
contact a VCC agent if and when needed by using the “talk-to-me” button. 



4. The Pilot Network 

For the purpose of experimental evaluation of our approach to VCC implementation 
the lOOMb/s pilot network with the following configuration will be used: 

• Two layer 3 (L3) switches, working back-to-back through IGb/s interconnection, 
using VRRP (Virtual Router Redundancy Protocol) protocol. 

• Two layer 2.5 (L2.5) switches, which support two traffic priority queues (high 
priority and low priority). 

• Two simple layer 2 (L2) switches. 

• Farm of about 10 servers (CTI, IVR, www, application, database, etc.) 

• 50 workstations. 

• Internet connection through the router and VPN switch. 

• One PSTN gateway. 

• One RAS (Remote Access Server). 

• One access point for wireless workstations. 

For a system such as VCC, the traffic classification and prioritization is the crucial 
issue. Of course the policy making must be centralized. It is done through Directory 
Service (DS), which enables the administrative personal to specify priorities for 
individual applications running in a VCC. Once the “policy” is made it is distributed 
to individual VCC’s switches. The switches in turn will start operating according to 
the central VCC’s policy. In this way the call-processing software which we develop 
will be generic, i.e. independent of the concrete VCC configuration. This is important, 
because with such a strategy the performance issues are becoming the issues for 
Intranet dimensioning and traffic policy making. Even on modest configurations VoIP 
can work with acceptable performance. Of course, IPv6 will solve all of the problems 
in the future. 

Apart from performance evaluation, we expect the pilot network to help us 
evaluate reliability, scalability, and extendibility issues, too. 



5. Conclusions 

In this paper we have presented our approach to Internet-based virtual call center 
implementation. We have described the concept, the most interesting implementation 
details, and the pilot network configuration. 

In addition to other efforts in the filed we view the VCC as a universal 
infrastructure, which could be used as a TMN center and IN SCP. 
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Abstract. Decoupled-CBQ, a CBQ derived scheduler, has been proved being a 
substantial improvement over CBQ. D-CBQ main advantages are a new set of 
rules for distributing excess bandwidth and the ability to guarantee bandwidth 
and delay in a separate way, whence the name "decoupled". This paper aims at 
the characterization of D-CBQ by means of an extended set of simulations and 
a real implementation into the ALTQ framework. 



1. Introduction 

Class Based Queuing [1], (CBQ) and Hierarchical Fair Service Curve [2] (H-FSC) 
represent an interesting solution to integrated networks that aim to provide 
hierarchical link-sharing, tighter delay hounds and bandwidth guarantees. However 
the configuration of HFSC is less intuitive from an Internet Service Provider point of 
view, therefore CBQ is the most appealing advanced scheduler available today. 

An in-depth analysis of CBQ, on the other hand, shows several problems; most of 
them have been pointed out in [3] and [7]. This paper aims at the completion of [7] by 
summarizing the D-CBQ characteristics, presenting the implementation issues, and 
characterizing this new scheduler. 

This paper is structured as follows. Section 2 summarizes D-CBQ characteristics; 
Section 3 presents the simulations used to validate the prototype, while Section 4 
discusses the efforts in implementing D-CBQ in a real router. Finally, Section 5 
presents some conclusive remarks. 



2. Decoupled Class Based Queuing 

The most important D-CBQ characteristics include new link-sharing guidelines, the 
decoupling between bandwidth and delay and the excellent precision in respecting 
bandwidth and delay. 
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2.1 New Link-Sharing Guidelines 

New link-sharing guidelines require the definition of the new concept of the Bounded 
Branch Subtree (BBS). All the boundedQclasses plus all the classes that are child of 
the root class are called BBS-root. Each BBS-root generates a Bounded Branch 
Subtree that includes the set of classes that share a BBS-root as common ancestor 
plus the BBS-root itself. BBSs can be embedded; an example can be seen in Fig. 1 



Bounded Branch 




Fig. 1. Bounded Branch Subtrees. Classes can be either unbounded ("UnBnd") or bounded 
("Bnd") 



Each BBS acts as a new link-sharing hierarchy that is almost independent from the 
others. A class may belong to several BBSs, therefore it can have several BBS-root 
classes. Among these BBS-roots, the one with the lowest level in the link-sharing 
hierarchy is called L-BBS-root. The BBS generated by the L-BBS-root is called L- 
BBS. 

The distribution of the bandwidth is done by means of a two-step process that gives 
precedence to unsatisfied leaf classes. According to the first rule, a leaf class is 
allowed to transmit immediately if it is underlimit[]and its L-BBS-root is underlimit 
as well. This prevents the L-BBS from consuming bandwidth reserved to other 
subtrees. 

A second rule distributes excess bandwidth to the all the unbounded classes 
(according to the L-BBS they belong) when no classes are allowed to send according 



^ A bounded class is a class whose traffic that cannot exceed its allocated rate. 

^ A class is underlimit if its throughput (averaged over a certain period) does not exceed its 
allocated rate. See [1] for more details. 
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to the first rule. A class is allowed to get more bandwidth when it has a non-overlimi^ 
ancestor A^at level i and there are no unsatisfied classes in its L-BBS at levels lower 
than i. This guarantees that the excess bandwidth is distributed inside the L-BBS; 
therefore an overlimit class is allowed to transmit provided that there is still 
bandwidth available in its L-BBS. 

First rule allows a leaf class that is underlimit and bounded (that is the suggested 
configuration for real-time classes) to transmit without constraints, while an 
underlimit and unbounded leaf class can be delayed when its L-BBS-root is overlimit. 
Bounded leaf classes are never influenced by the behavior of other classes, while 
unbounded classes are. This does not represent a problem because an unsatisfied class 
that is delayed will be served as soon as its L-BBS-root becomes underlimit. 
Starvation does not occur and underlimit leaf classes are always able to reach their 
target rate; however the time-interval used to monitor their throughput must be larger 
than the time interval used to monitor bounded leaf classes. 

Unfortunately, these rules do not guarantee that any BBS-root never becomes 
overlimit. An embedded L-BBS-root can be allowed to transmit (it is underlimit 
indeed), making the higher-level BBS-root overlimit. It follows that even D-CBQ 
does not respect perfectly the link-sharing structure; however this is the fee that has to 
be paid in order to have some classes that are always able to send (provided that they 
are respectful of their service rate). This fee is required to provide excellent support 
for guaranteed-bandwidth classes that, on the other hand, are not allowed to send 
more than their guaranteed rate. 



2.2 Decoupling Bandwidth and Delay 

This feature requires that the excess bandwidth will be distributed independently of 
the class priority. D-CBQ uses two distinct systems of cascading WRR schedulers; 
the first one is activated when bandwidth is distributed according to the first link- 
sharing rule; the second one distributes excess bandwidth and it is enabled when 
bandwidth is allocated according to the second rule. First WRR uses priorities (i.e. it 
gives precedence to high priority traffic) and it guarantees each class to be able to get 
its allocated rate; second WRR does not take care of priorities and it selects classes 
according to their share. 

This mechanism is rather simple but effective: network managers can assign each 
user a specific value of bandwidth and delay and they will be certain that, whatever 
the priority is, excess bandwidth will be distributed evenly to all the currently active 
sessions. 

Suspending Overlimit Classes 

It has been widely recognized that link-sharing rules and delay guarantees cannot be 
met at the same time; therefore there are small time intervals in which one of them 
cannot be guaranteed. The decoupling between bandwidth and delay has to be 



^ A class is non-overlimit when it does not exceed its rate, averaged over a certain period. 

The class must be able to borrow from ancestor A. Basically, A can be either the root class (in 
case the L-BBS-root is unbounded) or a generic ancestor belonging to the L-BBS. 
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integrated with new rules that determine when a class has to be suspended (because 
either it or one of its parents is overlimit) and how long the suspension time will be. 

First answer is based on the new link-sharing guidelines: a class is suspended when 
is not allowed to transmit according to the second rule. D-CBQ allows each class 
being suspended, whereas CBQ allows suspension of leaf classes only. In this case D- 
CBQ suspends the highest-level ancestor (i.e. the nearest to the root class) whom the 
class is allowed to borrow from and that is overlimit. Since ancestor class is 
suspended, all unbounded leaf classes that share this ancestor are no longer allowed to 
transmit. Bounded classes, of course, are still allowed to transmit (first link-sharing 
guideline). 

Second question (how long) is based on the observation that a class (or a subtree, in 
the D-CBQ case) must be suspended for the time needed to be compliant to the 
allocated rate; hence the suspension time will be the one of the class that is being 
suspended, that is the ancestor class. It follows that the suspension time depends on 
the extradelay of the ancestor class; therefore it depends on the bandwidth 
allocated to the intermediate class instead of the one allocated to leaf classes. 
Generally speaking, suspension time depends on the "upper overlimit ancestor" 
instead of the leaf class. 




Fig. 2. Test topology. 



3. Simulation Results 

This Section presents some comparative results between CBQ (in the implementation 
that comes with ns -2) and D-CBQ through the ns -2 simulator. Results have been 
obtained by defining two different test suites; the first one devoted to bandwidth and 
link-sharing properties and the second one devoted to the delay objective^ Each test 
suite consists in several tests; each one has a different class configuration (link- 
sharing structure, borrow, priority) and it is structured in several simulations that have 
different incoming traffic and different bandwidth allocated to each class. 

D-CBQ (as well as CBQ) is, by nature, a non-work conserving algorithm although 
it can be easily modified in order not to leave the output link idle when there are 
unbounded backlogged classes. For instance, results compare CBQ with two different 



^ Simulations measure the scheduling delay, i.e. the time between the arrival of the packet and 
the time the scheduler finishes its transmission on the output link. 
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versions of D-CBQ, standard (described in Section 2) and ejficient. Efficient D-CBQ 
(D-CBQe) looks like a work-conserving algorithm and it adds a new rule to D-CBQ. 
When no classes are allowed to transmit according to the link-sharing guidelines, D- 
CBQe sends a packet from the first unbounded and backlogged class it encounters. 
Therefore it is not a pure work-conserving algorithm because bounded classes are not 
able to exploit the output link idleness. 

The efficient mechanism inserts a small degree of unfairness into D-CBQ. To keep 
the algorithm simple, the efficient process selects classes on a priority-based 
schedule; hence high priority classes can get more bandwidth. Moreover D-CBQ 
internal variables are not updated in case of a transmission due to this mechanism: the 
rational is that a class should not be punished for the bandwidth consumed when no 
classes are allowed to transmit. 

Simulations use a simple topology (Fig. 2) composed by a single scheduler and 
several sources attached to it. Number of sources, their rate and their traffic pattern 
varies among the simulations. Sessions under test use CBR and Poisson sources 
because of their simplicity and their easiness to be controlled. Tests repeated using 
real sources (UDP/TCP sessions simulating real traffic) do not show any difference 
compared to previous ones. 




Time (s) 



Fig. 3. New Link-Sharing guidelines. 

New Link- Sharing Guidelines 

This test shows that D-CBQ is able to guarantee each session a more predictable 
service. Fig. 3 shows a typical trace in which both CBQ and D-CBQ are able to 
provide the correct share over large-scale intervals. However this is not true over 
small scale: triangle trace shows that CBQ often suspends the class (large amount of 
time between two packets); this does not happen with D-CBQ. 
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Decoupling Bandwidth and Delay 

D-CBQ has strong decoupling characteristics. For example Table 1 reports the results 
obtained with a simple 1-level hierarchy (all classes are children of the root class). 
Priority does not influence bandwidth in D-CBQ. In case of all classes competing for 
the bandwidth (first two tests), they are able to obtain the assigned share. Moreover, 
last test shows that CBQ assigns all the available bandwidth (not used by class B) to 
the high priority class; D-CBQ allocates the excess bandwidth to all classes 
proportionally to their share. The result is that D-CBQ looks more like a Weighted 
Fair Queuing than a Priority Queuing schema from this point of view and it is able to 
force malicious users not to exceed their allocated rate. Setting higher priorities (than 
means lower delays) in D-CBQ is no longer a way to obtain more bandwidth. 



Table 1. Decoupling bandwidth and delay; all classes are allowed to borrow. 





Share 


Priority 


Traffic (Kbps) 


In 


CBQ out 


D-CBQ out 


Theor. 


Class A 


1% 


LOW 


100 


55.25 


21.46 


20 


Class B 


99% 


LOW 


2000 


1944.77 


1978.56 


1980 


Class A 


1% 


HIGH 


200 


181.82 


21.41 


20 


Class B 


99% 


LOW 


2000 


1818.19 


1978.61 


1980 


Class A 


10% 


HIGH 


2000 


604.08 


250.37 


250 


Class B 


20% 


LOW 


... 


... 


... 


... 


Class C 


70% 


LOW 


2000 


1395.94 


1749.65 


1750 



Quality Indexes 

Previous tests are not able to validate for certain the goodness of D-CBQ from other 
points of view as well (delay, precision). Moreover, these tests have to be carried out 
using several different configurations; therefore data need to be summarized in order 
to display the results. 

Comparison among different results is done using the following set of quality 
indexes: 

p* 

where D; is the expected test result (the theoretical value for link-sharing tests; the 
best obtained delay in the current simulation for delay tests), Di is the actual result of 
the simulation and N is the number of simulations. The quadratic quality index (Q^) 
highlights tests in which the behavior is significantly different from the expected 
value; therefore it is used to identify any idiosyncrasies between theoretical and real 
behavior. Vice versa, linear index (|Q|) is the relative difference of the simulation 
results compared to the theoretical ones and it can be used to show the precision of 
CBQ and D-CBQ against the expected result. Best results are obtained when these 
indexes tend to zero. 



1 N 

e" = — E 

N 



p,-p, 

D* 



1 N 
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Link-sharing Test Suite Details 

Main objective is to verify the ability to provide each class with its target bandwidth; 
therefore sources (CBR) exceed the rate allocated to that class. The link-sharing test 
suite is made up of 10 different tests that use three different class configurations 
(details In Appendix I); each test aims at the evaluation of specific aspects of the 
algorithm. 

A brief summary of the results (details can be found in [5]) is shown in Table 2: 
quality index for CBQ is by far the worst of all. Results confirm that D-CBQ 
performs far better than the original algorithm, particularly in tests with borrowing 
enabled and different priorities among classes. |Q| shows that the difference between 
experimental results and theoretical one is greatly reduced from the 14.1% of CBQ to 
the 1.7% of D-CBQ. 

An interesting point is that the efficient version of D-CBQ performs worse than the 
standard version. Efficient mode, in fact, inserts another trade-off between link 
utilization and precision parameters. For instance, a class could not be allowed to 
transmit at its target time when efficient mode is turned on because the scheduler 
could be busy servicing another packet. 



Table 2. Link-sharing test results (values * 100). 





Quadratic Quality Index 


Linear Quality Index 


Test 


CBQ 


D-CBQ 


D-CBQe 


CBQ 


D-CBQ 


D-CBQe 


Test 1 


0.3112 


0.0111 


0.0111 


5.0110 


0.8345 


0.8345 


Test 2 


0.0000 


0.0000 


0.0000 


0.0008 


0.0008 


0.0008 


Test 3 


0.2841 


0.0646 


0.0646 


5.0140 


2.0899 


2.0899 


Test 4 


51 .7934 


0.0948 


0.0948 


30.5246 


1.7590 


1 .7590 


Test 5 


0.2994 


0.0721 


0.0721 


4.8157 


2.2212 


2.2212 


Test 6 


1091.2810 


0.0886 


0.0886 


137.4755 


1.6486 


1 .6486 


Test 7 


0.0008 


0.0001 


0.0001 


0.2325 


0.0671 


0.0671 


Test 8 


0.0099 


0.0001 


0.0001 


0.8036 


0.0780 


0.0780 


Test 9 


0.1672 


0.0482 


0.2211 


3.5506 


1.7568 


2.7292 


Test 1 0 


0.3756 


0.1706 


0.3479 


5.0937 


2.5652 


3.8744 


Global 


73.1666 


0.0815 


0.1784 


14.0525 


1.7387 


2.3698 



A small problem still remains: even D-CBQ is not able to transmit all the traffic 
when input sources are transmitting at their maximum rate. This is due to internals 
approximations. A good practice consists in a slight over-provisioning of the 
bandwidth allocated to that class; an in-depth analysis of this phenomenon is left to 
future studies. 

Delay Test Suite Details 

Delay test suite is made up of five tests that differ in the characteristics of the real- 
time sources. Simulations use both CBR and Poisson traffic for real-time sources, and 
a set of several VBR sessions for data traffic. Real-time sources transmit slightly less 
than the bandwidth allocated to their class; vice versa data traffic exceeds its 
allocation in order to make the output link congested. Three tests use a single CBR 
source for each session (each test has a different packet size); the fourth uses three 
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CBR sources with different packet sizes (120, 240, 480 bytes) for each session, while 
the last uses Poisson sources. Last two tests use a token-bucket limiter to regulate 
real-time sources and to control input pattern (and source's peak rate) with excellent 
accuracy. Twelve different simulations with different link-sharing structure, priority 
and borrowing characteristics compose each test. 

This paper summarizes the results related to the maximum delay experienced by 
packets and the delay experienced by the 99% of them (99-percentile). Results are 
given only for the real-time traffic because best effort one exceeds its allocation; 
therefore delay has no significance. Detailed analyses for delay bounds are left for 
future work [6]; here only a brief summary is given. 



Table 3. Maximum delay tests: results (values * 100). 




Starting from the maximum experimented delay, results (shown in Table 3) 
confirm that D-CBQ outperforms CBQ in all tests. Packets flowing through CBQ 
have a maximum delay that is far larger than the one experimented by D-CBQ. CBQ 
performs better only in a few simulations and this is due to its different (and wrong) 
implementation of the WRR mechanismj] 

D-CBQ improvement concerning the 99-percentile delay bound (Table 4) is not so 
evident such as in the maximum delay bound. D-CBQ, however, still guarantees 
smaller delays: delays of the 99-percentile of the CBQ packets are 4 times larger than 
the D-CBQ ones. Results in which CBQ outperforms D-CBQ (like PS) are due 
(again) to the different WRR implementation. 

Fig. 4 shows a typical distribution of the delay in CBQ and D-CBQ and it points 
out that delay distribution for high priority classes is almost the same, even if CBQ 
has a small percentage of packets that have significantly larger delays than D-CBQ. 



® A detailed analysis of the CBQ code shows that it does not implement correctly the WRR 
mechanism and it often sets a class allocation to zero arbitrarily instead of leaving it 
negative. This operation has a non-negligible impact on classes with large packets compared 
to their allocation: these classes do no longer need to wait several rounds before being able to 
transmit. Some configurations take particular advantage of that, hence CBQ may show better 
delay bounds. 
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Table 4. 99-percentile delay tests: results (values * 100). 





0 4 8 12 16 20 24 28 32 36 40 44 48 

Time intervals (ms) 

Fig. 4. A typical cumulative distrihution of the delay in CBQ and D-CBQ. 

D-CBQe performances are never better than D-CBQ. Even if the efficient part of 
the algorithm affects unbounded classes only, this feature influences real-time classes 
(that are usually bounded) as well because it forces sometimes real-time classes to 
wait longer before transmitting a packet. 



4. ALTQ Implementation 

ALTQ implementation is almost the same as the ns-2 one. The most important 
difference is the inability to wake up a class exactly at its target time, since this would 
require setting a new timer event each time a class is suspended. This might overload 
the processing power of the machine; therefore the suspension time is approximated 
within discrete intervals based on the kernel timer (set to IKHz on our machines) 
available in the BSD kernel. Other differences are related to real-word issues, for 
example the presence of output interface buffers (virtually a FIFO queue after the 
CBQ scheduler) and the possible mismatch between the theoretical wire speed and the 
real one (for instance, header compression, link-layer headers and more may alter the 
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real speed). First problem, pointed out in [3], is of great importance and is largely 
reduced in ALTQ 3.0 (several interface drivers have been modified). The second 
point does not have solutions at this moment and it results on some mismatch between 
some theoretical values (for example the packet departure time) and real ones. 

ALTQ implementation has been validated using a subset of the test already used in 
ns-2 and results confirm the goodness of D-CBQ as well as in the simulations. The 
most important result, however, is that D-CBQ complexity is slightly more than CBQ 
one, although in presence of un-optimized code. This is evident primarily in the 
borrow tests in which D-CBQ has to check the status of the class' ancestors. 



Table 5. ALTQ tests snapshot: maximum throughput (packets per second) on a Pentium 133 
machine. 



Test 


Packet size (bytes) 


CBQ 


D-CBQ 


Difference 


No Borrow 


40 


7080 


6599 


-6.8% 




58 


6992 


6463 


-7.6% 


Borrow 


40 


6847 


6186 


-9.7% 




58 


6691 


6140 


-8.2% 



5. Conclusions 

D-CBQ has been proved being a substantial improvement over CBQ. Its 
characteristics allow the deployment of this scheduler on networks with advanced 
requirements (hierarchical link-sharing, bandwidth guarantees, delay bounds). 

Efficient D-CBQ has been shown being not worthy from the link-sharing and delay 
point of view. However further analyses are needed in order to evaluate the 
advantages of this algorithm on best effort traffic: we should expect some 
improvements in term of link utilization and throughput for these classes. 

Next step will be a better characterization of D-CBQ from the viewpoint of the 
delay in order to give a mathematical indication of the maximum delay bound 
experimented by a D-CBQ session. 

Source code for ns-2 and ALTQ, together with test script, is online at the Author's 
website. 



Acknowledgements 

The author thanks Salvatore lacono, Giordana Lisa and Kenjiro Cho for many 
discussions about CBQ internals. Best thanks also to Ivan Ponzanelli and Lucio Mina 
for their insightful help in testing and validating the prototypes, Panos Gevros, Mario 
Baldi and Jon Crowcroft for their comments. 

This work has been partially sponsored by Telecom Italia Lab, S.p.A., Torino 
(Italy). 



Implementation and Characterization of an Advanced Scheduler 95 



References 

[1] Sally Floyd and Van Jacobson, Link Sharing and Resource Management Models 
for Packet Networks, lEEE/ACM Transaction on Networking, Vol. 3 No. 4, 
August 1995. 

[2] Ion Stoica, Hui Zhang, T. S. Eugene Ng, A Hierarchical Fair Service Curve 
Algorithm for Link-Sharing, Real-Time and Priority Service, in Proceedings of 
SIGCOMM '97 September 1997. 

[3] Fulvio Risso and Panos Gevros, Operational and Performance Issues of a CBQ 
router, ACM Computer Communication Review, Vol. 29 No 5, October 1999. 

[4] The VINT Project, UCB/LBNL/VINT Network Simulator - ns (version 2). 
Available at http://www-mash.cs.berkeley.edu/ns/. 

[5] Ivan Ponzanelli, Garanzie di servizio con schedulers di tipo Class Based 
Queuing, Laurea Thesis, Politecnico di Torino, July 2000. In Italian. 

[6] Fulvio Risso, Delay Guarantees in D-CBQ, Draft Paper, Politecnico di Torino, 
April 2001. 

[7] Fulvio Risso, Decoupling Bandwidth and Delay Properties in Class Based 
Queuing, in Proceedins of the Sixth IEEE Symposium on Computers and 
Communications (ISCC 2001), July 2001 



Appendix I 

This appendix presents the details of the link-sharing and delay test suites. 




First link-sharing structure 



Second link-sharing structure 



Third link-sharing structure 



Test 1 : no borrow, single flow 
Test 2: borrow, single flow 
Test 3: no borrow, both flows, same priority 
Test 4: borrow, both flows, same priority 
Test 5: no borrow, both flows, different priorities 
Test 6: borrow, both flows, different priorities 



Test 7: borrow, two flows, same priority 
Test 8: borrow, two flows, different priorities 



Test 9: different borrow, flows and priorities 
configurations 

Test 1 0: same as test 9; classes with 
different packet sizes 



Fig. 5. Link-sharing test suite: link-sharing structure. 
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The link-sharing test suite is made up of 10 different tests that use the three 
different class configurations shown in Fig. 5. Each test aims at the evaluation of 
specific aspects of the algorithm. In detail: 

• Test 1: precision of the traffic carried by a single class, taken in isolation 

• Test 2: ability to exploit all the link bandwidth by a single class, taken in 
isolation 

• Tests 3, 4: ability to share correctly the bandwidth among peer classes; tests 
have been performed with and without borrowing 

• Tests 5, 6: same as tests 3 and 4; classes have different priorities 

• Tests 7, 8: ability to share the excess bandwidth correctly; classes can have 
either equal or different priorities 

• Tests 9,10: ability to respect the imposed bandwidth among classes with 
different priorities, borrow configuration, incoming traffic; Test 10 repeats the 
same simulations using different packet sizes among classes. Configuration 
details are shown in Table 6, as well as the expected throughput of each class. 

Delay test-suite is similar to the previous one: it consists in five link-sharing 
hierarchies, with different class configuration and traffic. A summary of the test 
characteristics is reported in Fig. 6. 



Table 6. Details of tests 9 and 10 (throughput in Kbps). 



Simulations 


Classes 




A 


1 


2 


B 


3 


4 


1 


Priority 


LOW 


LOW 


LOW 


LOW 


LOW 


LOW 




Borrow/T raffio 


Y/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


2 


Priority 


LOW 


LOW 


LOW 


LOW 


LOW 


LOW 




Borrow/T raffic 


N/- 


N/Y 


N/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


3 


Priority 


LOW 


LOW 


LOW 


LOW 


LO 


LOW 




Borrow/T raffic 


N/- 


N/Y 


N/Y 


Y/- 


Y/N 


Y/N 




Expected throughput 




200 


400 








4 


Priority 


LOW 


LOW 


LOW 


LOW 


LOW 


LOW 




Borrow/T raffio 


Y/- 


Y/Y 


Y/Y 


Y/- 


Y/Y0 


Y/Y 




Expected throughput 




200 


400 




200 


1200 


5 


Priority 


LOW 


LOW 


LOW 


LOW 


LOW 


LOW 




Borrow/T raffio 


N/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


6 


Priority 


HIGH 


HIGH 


HIGH 


LOW 


LOW 


LOW 




Borrow/T raffio 


Y/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


7 


Priority 


HIGH 


HIGH 


LOW 


HIGH 


LOW 


HIGH 




Borrow/T raffio 


Y/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


8 


Priority 


HIGH 


HIGH 


HIGH 


LOW 


LOW 


LOW 




Borrow/T raffic 


N/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 


9 


Priority 


HIGH 


HIGH 


LOW 


HIGH 


LOW 


HIGH 




Borrow/T raffic 


N/- 


Y/Y 


Y/Y 


Y/- 


Y/N 


Y/Y 




Expected throughput 




200 


400 






1400 



^ This test uses an on-off source, with 33% activity period. 
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Abstract. This paper compares the simulated performance of RED 
routers and ECN routers. The results show that ECN provides better 
goodput and fairness than RED for heterogeneous TCP flows. When 
demand is held constant, increasing the number of flows generating the 
demand has a negative effect on performance, ns-2 simulations with many 
flows demonstrate that the bottleneck router’s marking probability must 
be aggressively increased to provide good ECN performance. These ex- 
periments suggest that an adaptive version of ECN should provide better 
performance than ECN. 



1 Introduction 

With increased World Wide Web traffic has come heightened concern about In- 
ternet congestion collapse. Since the first congestion collapse episode in 1986, 
several variants of TCP (Tahoe, Vegas, Reno and NewReno) have been devel- 
oped and evaluated to provide host-centric mechanisms to combat high packet 
loss rates during heavy congestion periods. Additionally, researchers have pro- 
posed new congestion avoidance techniques for Internet routers. While the initial 
concept was to use packet loss at FIFO routers to signal congestion to the source, 
the resulting drop-tail behavior failed to provide adequate early congestion no- 
tification and produced bursts of packet drops that contribute to unfair service. 

Since the introduction of Random Early Detection (RED) in 1993, re- 
searchers have proposed a variety of enhancements and changes to router man- 
agement to improve congestion control while providing fair, best-effort service. 
Although RED has outperformed drop-tail routers in several simulation and test- 
bed experiments P, 0, 0, 0, 0, H21, Christainsen et al 0 have demonstrated 
that tuning RED for high performance is problematic when one considers the 
variability of Internet traffic. 

RED has been shown to be unfair when faced with heterogeneous flows 
na and the recommended RED parameter settings 0 are not aggressive enough 
in heavy congestion generated by a large number of flows 0 , 0 , 0 . 

Concern over reduced performance on the Internet during traffic bursts such 
as Web flash crowds helped spawn the IETF recommendation 0 for new active 
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queue management techniques that provide early congestion notification to TCP 
sources. Several research studies Q, Q, IHI, i have reported better performance 
for Explicit Congestion Notification (ECN) when compared against RED. These 
results add support to the Internet draft ’’Addition of ECN to IP” m- However, 
most of these studies cover only a limited portion of the traffic domain space. 
Specifically, little attention has been given to evaluating the effects of a large 
number of concurrent flows. Although a couple of these studies consider fairness 
among competing homogeneous flows, ECN behavior with heterogeneous flows 
has not been thoroughly studied. 

This paper presents results from a series of ns-2 simulations comparing the 
ability of RED and ECN to provide fair treatment to heterogeneous flows. The 
goal of this report is to add to the existing information on ECN behavior specifi- 
cally with regard to the impact of the number of flows, the effect of ECN tuning 
parameters on performance, and the effectiveness of ECN’s congestion warnings 
when many flows cause the congestion. The results of this study provide insight 
into a new active queue management scheme, AECN, Adaptive ECN. 

Section 2 briefly defines a few measurement terms and reviews previous ECN 
studies to provide context for our experiments. Section 3 discusses experimental 
methods. The next section analyzes the simulated results and the final section 
includes concluding remarks. 

2 Definitions and Background 

The performance metrics used in this investigation include delay, goodput and 
two ways to evaluate fairness. The delay is the time in transit from source to des- 
tination and includes queuing time at the router. Goodput differs from through- 
put in that it does not include retransmitted packets in the count of packets 
successfully arriving at the receiver. Given a set of flow throughputs 

{xi,X2 , ■■■,Xn) 

Jain’s fairness index m is defined in terms of the following function 

f(Xl,X2,-,Xn) 

A second form of fairness introduced in section 4 focuses on the difference be- 
tween the maximum and minimum average goodput for groups of heterogeneous 
flows |E|. 

Random Early Detection (RED) |Sj utilizes two thresholds {miri-th, maxjtli) 
and an exponentially-weighted average queue size, avc-q, to add a probabilistic 
drop region to FIFO routers, max-p is a RED tuning parameter used to control 
the RED drop probability when ave^q is in the drop region. The drop probability 
increases linearly towards max-p as ave^q moves from minJ,h to max-th. When 
ave^q reaches max-th, RED switches to a deterministic (100%) drop probability. 
max-th is set below the actual queue length to guarantee drops that signal router 
congestion before the physical queue overflows. 
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Explicit Congestion Notification (ECN) mm marks a packet (instead of 
dropping) when ave^q is in the probabilistic drop region. In the deterministic 
drop region, ECN drops packets just as RED does. We briefly consider an ECN 
variant, ECNM, that marks packets in the deterministic region. 

Lin and Morris m define fragile TCP flows as those eminating from sources 
with either large round-trip delays or small send window sizes and robust flows 
as having either short round-trip delays or large send windows. This delineation 
emphasizes a flow’s ability to react to indications of both increased and decreased 
congestion at the bottleneck router. Our experiments simulate three distinct flow 
groups (fragile, average, and robust flows). These flows differ only in their end- 
to-end round-trip times (RTTs). To simplify the analysis, the maximum sender 
window is held fixed at 30 packets throughout this investigation. 

Floyd’s original ECN paper 0 shows the advantages of ECN over RED us- 
ing both LAN and WAN scenarios with a small number of flows. Bagal et al 0 
compare the behavior of RED, ECN and a TCP rate-based control mechanism 
using traffic scenarios that include 10 heterogeneous flows. They conclude that 
RED and ECN provide unfair treatment when faced with either variances due to 
the RTTs of the heterogeneous flows or variances in actual flow drop probabili- 
ties. Focusing on a window advertising scheme (GWA), Gerla et al jSj compare 
GWA, RED, and ECN in scenarios with up to 100 concurrent flows. Using the 
gap between maximum and minimum goodput as a fairness measure, they show 
that ECN yields better fairness than RED for homogeneous flows. Salim and 
Ahmed HH use Jain’s fairness to compare ECN and RED performance for a 
small number of flows. Their results emphasize that max-p can significantly ef- 
fect performance. The ns-2 experiments discussed in this paper combine and 
extend these results. 

3 Experimental Methods and Simulation Topology 

This study uses the newest version of Network Simulator from UCB/LBNL, ns- 
2in], to compare the performance of ECN and RED routers with TCP Reno 
sources. The simulation network topology (shown in Figure 1) consists of one 
router, one sink and a number of sources. Each source has a FTP connection 




Fig. 1: Simulation Topology 
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feeding 1000-byte packets into a single congested link. The bandwidth of the 
bottleneck link is 10Mbps with a 5 ms delay time to the sink. The one-way link 
delays for the fragile, average and robust sources are 145 ms, 45 ms and 5 ms 
respectively. Thus, the fragile, average and robust flows have round-trip times 
of 300 ms, 100 ms and 20 ms when there is no queuing delay at the router. 

All simulations ran for 100 simulated seconds. Half the flows were started 
at time 0 and the other half were started at 2 seconds. The graphs presented 
exclude the first 20 seconds to reduce transient startup effects. The router for all 
simulations has a miri-th of 5 packets and a physical queue length of 50 packets. 
Except for the maximum send window size of 30 packets, all other parameters 
use the ns-2 default values. 

4 Results and Analysis 

A series of ns-2 experiments were run such that the cumulative traffic flow into 
the heavily congestion router remains fixed at 600 Mbps even though the 
number of flows is varied across simulations. In all cases, the number of flows is 
equally divided among the three flow categories. Thus, 15 flows in the graphs 
implies 5 fragile, 5 average and 5 robust flows each with a 40 Mbps data rate 
whereas a graph point for 120 flows implies a simulation with 40 fragile, 40 
average and 40 robust flows each with a 5 Mbps data rate. Simulations were run 
with the total number of flows set at 15, 30, 60, 120, 240, 480 and 600 flows. 

Figure 2 gives ECN and RED goodput with the number of flows varying 
from 15 to 600. ECN with max-p = 0.5 provides the best goodput in all cases 
except 15 flows. In the other router configurations there is a large drop in goodput 
beginning at 64 flows. Figure 3 presents the delay for ECN and RED with maxjp 
= 0.5. This figure shows the clear advantage robust flows have with respect to 
delay, but more importantly it demonstrates that the ECN goodput improvement 
from Figure 2 is offset by a small increase in the one-way delay for ECN. 

Figures 4 and 5 track the effect of varying max-p and maxJ,h in simulations 
with 30 and 120 flows respectively. Figure 4 shows that maxJh has little effect 
on goodput above max-p = 0.2. In Figure 5 where 120 flows provide the same 
flow demand as 30 flows in Figure 3, ECN with max-p = 0.5 and maxAh = 30 
yields the highest goodput and there is no max-p setting for RED that works 
well. 

Figure 6 employs Jain’s fairness to quantify RED and ECN behavior. ECN 
is fairer than RED in almost all situations. Since perfect fairness has a Jain’s 
fairness index of 1, it is clear that as the number of flows goes above 120 none of 
the choices prevent unfairness. The fact that ECN with max-p = 0.1 is fairest at 
30 flows while max-p = 0.5 is the fairest at 60 and 120 flows implies the marking 
probability could be dynamically adjusted for ECN based on a flow count esti- 
mator. The unfairness at a high number of flows can be partially attributed to 
a lockout phenomenon where some flows are unable to get any packets through 
the congested router for the duration of the simulation. Locked out flows begin 
to appear for both RED and ECN above 120 flows. 
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Fig. 2: RED and ECN Goodput, 
maxJ,h=‘iQ 



Fig. 3: RED and ECN Delay, 
max-p=0.5, maxJ,h=ZQ 





Fig. 4: Goodput with 30 flows 



Fig. 5: Goodput with 120 flows 
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Fig. 6: RED and ECN Fairness, max-th=30 



Figures 7 through 9 provide a visual sense of max-min fairness for RED and 
ECN via the gap between the average goodputs for the three flow groups. 

Aggregate goodput in these graphs is the sum of the fragile, average, and 
robust goodputs. ECN provides better aggregate goodput than RED in all three 
graphs, but the difference is most pronounced in Figure 9 where the traffic is 
generated by 120 flows. Figure 7 and 8 differ only in an increase of max-p from 0.2 
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ECN (fragile flous) 


ECN (average flous) — B — 


ECN (robust flous) —A— 


RED (aggregate) — •— 


RED (fragile flous) 


RED (average flous) — ^ — 


RED (robust flous) — * — 



Fig. 7: Goodput Distribution, 30 flows, max-p— 0.2, max-th=30 
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RED (aggregate) — i — 


RED (fragile flous) 


RED (average flous) — ^ — 


RED (robust flous) — * — 



Fig. 8: Goodput Distribution, 30 flows, max_p— 0.8, max-th=30 



to 0.8. The more aggressive ECN marking in Figure 8 provides better goodput 
for robust flows than RED. However this change does not reduce the goodput 
gap between robust and fragile flows. Figure 9 keeps max-p = 0.8 but simulates 
120 flows. Although aggregate goodput remains relatively unchanged for ECN 
in Figure 9, the goodput for the robust flows goes down while the goodput of 
the average and fragile flows increase slightly. This implies that an adaptive 
ECN that uses different values of max-p for heterogeneous flows can provide 
improvement in the visual max- min fairness. RED goodput is adversely affected 
by more flows. 








104 R. Kinicki, Z. Zheng 




ECN (aggregate) o 


ECN (fragile flows) 
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ECN (robust flows) —A— 


RED (aggregate) — i — 


RED (fragile flows) 


RED (average flows) — ^ — 


RED (robust flows) — * — 



Fig. 9: Goodput Distribution, 120 flows, max-p=0.8, max-th=30 
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RED (average flows) — ^ — 


RED (robust flows) — * — 



Fig. 10: Throughput Distribution, 120 flows, max_p=0.8, max_th=30 



The significance of using goodput instead of throughput as a performance 
metric can be clearly seen in Figures 9 and 10. Because goodput excludes re- 
transmissions, RED has 15% lower goodput than ECN in Figure 9. Since RED 
drops and ECN marks, the RED drops trigger more TCP retransmissions. This 
effect is completely hidden in Figure 10 where aggregate RED throughput is 
only slightly lower than aggregate ECN throughput. 

Figure 11 compares ECN with ECNM. Recall ECNM differs from standard 
ECN in that ECNM marks packets when the average queue size exceeds max-th 
and drops packets only when the router queue overflows. The figure shows that 
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ECN provides better goodput except at small values of max-p and that ECNM 
appears quite sensitive to the maxAh setting. 







Fig. 11: ECN and ECNM Goodput with 120 flows 



5 Conclusions and Future Work 

This paper reports on a series of ns-2 simulations that compare ECN and RED 
performance with heterogeneous TCP flows. Generally ECN provides better 
goodput and is fairer than RED. The results show that for fixed demand the 
performance of both mechanisms decreases as the number of flows increases. 
However, ECN with an aggressive max-p setting provides significantly higher 
goodput when there are a large number of heterogeneous flows. ECN also had a 
higher Jain’s fairness index in the range of flows just below where flow lockouts 
occurred. 

In the simulations studied neither RED nor ECN strategy were fair to fragile 
and average flows. These results suggest that if congestion control is to handle 
Web traffic consisting of thousands of concurrent flows with some degree of fair- 
ness then further enhancements to ECN are needed. We are currently conducting 
simulations with an adaptive version of ECN that adjusts max-p based on the 
round-trip time of a flow and an estimate of the current number of flows in each 
flow group. 
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Abstract. We present a diffusion model of a network node controlled by 
RED mechanism used to indicate congestion but not to delete packets. 
Diffusion approximation allows us to study the dynamics of flow changes 
introduced by this mechanism in a more efficient way than simulation. 
After introducing some basic notions on diffusion approximation and on 
our approach to solve diffusion equations analytically or numerically, we 
present a closed loop model of flow control and investigate the influence 
of delay and of control parameters on performance of the system. Also 
FECN/BECN scheme is considered: flow remains constant within an 
interval of fixed length and is changed in next interval if the number of 
marked packets during the interval is above a certain threshold. Diffusion 
results are validated by simulation. 



1 Introduction 

Diffusion approximation has been proven to be an efficient tool to analyse tran- 
sient states in various traffic control mechanisms, e.g. leaky bucket or thresh- 
old queue ^ and sliding window cni. Here, we adapt it to model dynamics of 
Random Early Detection (RED) algorithm which received much attention, e.g. 
ECU and is recommended by Internet Engineering Task Force as a queue man- 
agement scheme for rapid deployment | 2 , as it turns out that end-to-end TCP 
window-based congestion control mechanism is not sufficient to ensure Internet 
stability and should be supplemented by router-based schemes. Its performance 
was modelled with the use of simulation or Markov chains |Z]. The principle of 
RED mechanism is to start discarding packets with a specified probability be- 
fore the buffer becomes full, opposite to the principles of Tail Drop mechanism. 
The probability of discarding packets is given by a specified drop function, see 
Fig.[H The argument rj of this function is a weighted moving average of queue 
length: 77 := (1 — w)r] wn where w is a constant and n is current queue length 
upon arrival of a new customer. Explicit Congestion Notification is an exten- 
sion proposed to RED which marks a packet instead of dropping it. Since ECN 
marks packets before congestion actually occurs, this is useful for protocols like 
TCP that are sensitive to even a single packet loss. Upon receipt of a congestion 
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marked packet, the TCP receiver informs the sender (in the subsequent ACK) 
about incipient congestion which will in turn trigger the congestion avoidance 
algorithm at the sender. In this paper, we study the performance of ECN mech- 
anism using diffusion approximation and simulation. We compare it also with a 
virtual circuit model in Frame Relay or ATM networks with closed-loop feedback 
control mechanism. The model is based on the use of the FECN/BECN scheme. 



d(n) 




Fig. 1. RED drop function 



After introducing some basic notions on diffusion approximation and on our 
approach to solve diffusion equations analytically (Section 2) or numerically 
(Section 3), we present a closed loop model of flow control based on RED mech- 
anism (Section 4) and its extention to FECN/BECN scheme (Section 5). Section 
6 gives several numerical results presenting validation of diffusion models and a 
study of the influence of some control parameters on the network performance. 
Section 7 presents conclusions. 



2 Transient Solution to G/G/l/N Queue: 

Diffusion Approximation, Analytical Approach 

Diffusion approximation m represents the number N(t) of customers in a queue 
at a time t by the value of a diffusion process X(t); the density function f{x, t; Xq) 
of this process is given by diffusion equation 

df{x,t;xo) _ ad'^f{x,t;xo) df{x,t;xo) 

dt ~ 2 dx^ ^ dx ^ ’ 

and approximates the queue distribution: f{n,t]no) Ri p{n,t;no) = Pr[N(t) = 
n|A^(0) = no]. Diffusion parameters a, [3 represent the characteristics of input 
flow and service time distribution. For a G/G/1 or G/G/l/N model where 1/A, 
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cr^ are mean and variance of interrarival time distribution A{x) and l//t, <y% are 
mean and variance of service time distribution B{x), they are chosen as, e.g. 



Diffusion model with two barriers and instantaneous returns of G/G/l/N 
station was proposed by Gelenbe in 0 where the steady-state solution of the 
model was given. The diffusion process has two barriers at a; = 0 and x = N. 
When the process comes to a; = 0, it stays there for a time representing the 
idle period and then jumps immediately to a: = 1 starting new busy period. 
Similarity, when the process attains a; = it stays at the barrier for a time 
representing the period when the queue is full and then jumps to a: = — 1. To 

obtain transient solution, we proceed similarity as in jS| representing the density 
function of diffusion process with instantanous returns by a superposition of 
densities of another diffusion process: the one with absorbing barriers at a; = 0 
and X = N. 

Gonsider a diffusion process with two absorbing barriers at a; = 0 and x = N , 
started at t = 0 from x = Xq- Its probability density function (/)(x,t;xo) has 
well known form, see e.g. EEI. If the initial condition is defined by a function 
ip(x), X € (0,N), lim^^-j-o ■*/'(2^) = ip(x) = 0, then the pdf of the process 

has the form (j){x , t', tp) = The probability density function 

f{x, t; Ip) of the diffusion process with elementary returns is composed of the 
function ^(a:, t; tp) which represents the influence of the initial conditions and of 
a spectrum of functions 4>{x, t — t; 1), </>(a:, t — t; N — 1) which are pd functions 
of diffusion processes with absorbing barriers at a; = 0 and x = N, started at 
time r < t at points a: = 1 and a; = A^ — 1 with densities gi^r) and gN-i(T): 

= 4’{x,f,'P)+ gi{T)(p‘ix,t-T-,l)dT+ gN-i{T)(l){x,t-T-,N-l)dT. 

Jo Jo 

( 3 ) 

Densities 7o(t), 7w(i) of probability that at time t the process enters to a; = 0 
or X = N are 

7o(i) =Po(0)i5(t) -k [1 -po(0) -pjv(0)]7^,o(i) + [ gi{T)-fi^o{t - T)d,T 



where 7i,o(^), 7i,Af(^)j 'yN-i,N{t) are densities of the first passage time 

between points specified in the index, e.g. 



P = X — fi , a = -I- = C\X + Cgg ■ 



( 2 ) 




( 4 ) 



7i,o(f) = 




P(j){x,t] 1)] • 



( 5 ) 
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The functions 7 y,,Ar(t) denote densities of probabilities that the initial 

process, started at t = 0 at the point ^ with density at time t by 

entering respectively x = 0 or x = N. 

Finally, we may express gi{t) and gN(t) with the use of functions 70 (t) and 

7N(t): 



lo(i)k(T - t)dt , = f "fiq{t)lN{T -t)dt , (6) 

Jo 

where Iq{x), In(x) are the densities of sojourn times in a; = 0 and x = N' 
the distributions of these times are not restricted to exponantial ones. Laplace 
transforms of Eqs. 00) give us 3 i(s) and gN-i{s)', the Laplace transform of the 
density function /(cc, t; -ip) is obtained as 

f{x, s; tp) = (p{x, s; ip) + gi{s) (p{x, s; 1) + gN-i{s) ^{x, s;N -1) . (7) 

Probabilities that at the moment t the process has the value a: = 0 or x = iV are 

Po{s) = ^ [7o(s) - 5i(s)] , Pn{s) = ^ [7iv(s) - gN-i{s)] ■ (8) 

The Laplace transforms /(x, s; ip), po{s), Pn{s) are inverted numerically follow- 
ing Stehfest algorithm m- 

3 Diffusion Approximation — Numerical Approach 

The approach presented above gives transient solution to a diffusion equation 
with constant parameters a and j3. If the parameters are changing with time, 
we should define the time periods (e.g. of the length of one mean service time) 
where they may be considered constant and solve diffusion equation within these 
intervals separately; transient solution obtained at the end of an interval serves 
as the initial condition for the next one. If the input stream or service times 
depend on the queue length, the diffusion parameters depend also on the value 
of the process: a = a(x,t), (3 = (3{x,t). In this case also the diffusion interval 
X S [0, iV] is divided into subintervals of unitary length and the parameters are 
kept constant within these subintervals. For each time- and space-subinterval 
with constant parameters, transient diffusion solution is obtained. The equations 
for space-intervals are solved together with balance equations for probability 
flows between neighbouring intervals. For more complex models it is convenient 
to solve a diffusion model entirely numerically. A method of lines M has been 
adapted to fit the case of diffusion equation. The basis of this method, which 
is sometimes called the generalized method of Kantoravich, is substitution of 
finite differences for the derivatives with respect to one independent variable, 
and retention of the derivatives with respect to the remaining variables. This 
approach changes a given partial differential equation into a system of partial 
differential equations with one fewer independent variable. 
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They were the following implementation problems: 

1. instantenous returns introduce Dirac delta functions, which in turn have to 
be approximated and cause singularities in the integrated function, 

2. a sum of the integral of probability density function and probabilities that 
process is in boundary barriers should be constant and equal to 1, 

3. parameters of equation are changing; since they are dependent on a tem- 
porary state of a model, their values (especially a) can be small enough to 
cause a lack of stability of computations. 

The first problem has been solved by approximation of Dirac delta function 
with a rectangular impulse function dx wide and 1 /dx high where dx denotes 
integration step on x axis. The second obstacle was avoided using a conservative, 
centered, 2-order scheme: 



Pfc+l,n Pk,n — ( {Pk,n-\-lPk,n-\-l Pk,n—lPk,n—l) 



At 

2Ax 



0-5(pfc,n+lCKfc,n+l Pk,n—lC^k,n—l ‘2‘Pk,nC^k,n) 



At 

{Axy 



■ (9) 



The probability mass which moves at a step k into the barriers is computed 
as pk+i,oAx and pk+i,NAx where N denotes the location of right barrier. This 
mass is added to probabilities po and pn that process is in barrier either 0 or 
N respectively and Pk+i,o and Pk+i,N are assigned with 0 (barriers are absorb- 
ing the mass). The solution of the third mentioned problem is to approximate 
the movement of the probability mass with fluid flow approximation whenever 
a{x, t) is too small. Unfortunatelly, to conserve the probability mass it would 
require to recompute whole integration step in t axis every time such an action 
appears (we should also apply a special integration scheme in the nearest neigh- 
bourhood of such a point). Thus having in mind that the case that a is close 
to zero is exceptional and does not occure too frequently the equation is simply 
renormalized. 

The stability of the integration scheme was evaluated using the Von Neu- 
mann test and the presented method is not stable if At > {AxY However, 
the estimation of stability region is complex, since parameters of equation are 
dependent. 



4 Diffusion Model of RED 

Consider a G/G/l/N queue with time-dependent input. In diffusion model we 
cannot distinguish the moments of arrivals. We simply consider the queue size 
in intervals corresponding to mean interarrival times. At = 1/A. The traffic 
intensity A remains fixed within an interval. When A changes, also the length 
1/A of the interval is changed. We know the queue distribution f(x, t; a;o) at the 
beginning of an interval. This distribution also gives us the distribution r(x, t) 
of response time of this queue. Let us suppose that T is the total delay between 
the output of RED queue and the arrival of flow with new intensity, following 
changes introduced by the control mechanism based on RED. 
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At the beginning of an interval i we compute the mean value of the queue: 
E[Ni] = 1 + weighted mean 

T]i = {1- w)r]i_ipi{0) + [1 - k(0)] [(1 - w)r]^-i + wE[Ni\] 

and then we determine on this basis, using the drop function d{rj) of RED as 
in Fig. 0 the probability of tagging the packet to announce the congestion. 
Following it, we determine the changes of traffic intensity: 

^new ~ [1 {^old E /lA) + d{vj)\old/ 0^ 

but only in the case when the last change was done earlier than a predefined 
silent period. The increase of flow is additive with Z\A and the decrease of flow is 
multiplicative with constant 1/a. This Xnew will reach the queue after the round 
trip time E[Ni] -l/p + T. 



5 Diffusion Model of FECN/BECN Scheme 

Consider a slightly different control scheme: the traffic intensity A remains fixed 
within a control interval D. For each At = 1/A (supposed moments of new 
arrivals), we compute E\Nj\ and, if E\Ni] > threshold, the counter is increased. 
At the end of the interval the value of Xnew is obtained, according to the ratio 
of marked packets, that means to the ratio of the counter content to the value 
D ■ X (the supposed number of packets that arrived during interval D) 

Xnew — [1 P] {Xold E /lA) E pXoldj a 

where: 

f 0 if marked packet ratio < predefined value, e.g. 0.5 
^ 1 otherwise 

This mechanism corresponds to forward (or backward) explicit congestion noti- 
fication (FECN/BECN) scheme. 

6 Numerical Results 

Fig.Opresents the studied model and Figs. 0-0 display some typical results. We 
choose constant service time l//r = 1 as a time unit (t.u.). The buffer capacity is 
A^ = 40. The values of other model parameters, i.e. delay T, thresholds thrmim 
thrmax, as well as control interval D and threshold thrior FECN/BECN scheme 
are given in captions of figures. 

Fig. 0 displays, in logarithmic and linear scales, examples of RED queue 
distributions for two different times: when the queue is relatively lightly loaded 
(t = 30 t.u.) and when it is overcrowded (t = 60 t.u.). The curves in logarithmic 
scale show that, in spite of 100 000 repetitions of the experiment, simulation 
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source 



receiver 




Fig. 2. Investigated system with RED mechanism (in case of FECN /BECN mechanism, 
the queue has only one threshold) 



model has still problems with determination of small values of probablilities. 
The same remark may be made for Fig. El presenting loss as a function of time 
- the simulation cannot give small loss probabilities while diffusion is able to 
furnish very small values (which are naturally approximative) . 

Figs. E] 0 give diffusion and simulation results for mean queue length as a 
function of time and for the resulting from RED mechanism time-dependent 
traffic throughput. Poisson input stream (Fig. EJ and constant (deterministic) 
stream (Fig. El are considered. 

Fig. IZlp resents long-range performance of RED mechanism. In the left figure 
the overall loss ratio, taken in the time interval of T = 10 000 t.u. length, is 
presented as a function of delay for two sets of parameters: (1) w = 0.002, 
Pmax = 0.02 and (2) w = 0.2, Pmax = 0.5; silent period = 0. The constant w 
is the weight with which the current value of queue is taken into consideration 
at rj. For the first set of parameters the loss is high (the mean queue value 
is nearly 35, see right figure) and practicly does not depend on delay. For the 
second set of parameters, the loss is lower and its growth with the value of 
delay is distinctly visible. Simulation results are obtained in two ways: they 
represent either real loss, i.e. the ratio of lost packets to the whole number of 
arrived packets, or the overall probability of full queue, P{^^ where t = 
1, 2, 3, . . . denotes consecutive slots of unitary length. Although the input stream 
is Poisson, the both results are not the same: as there is permanent transient 
state, the property of PASTA does not hold. Diffusion approximation represents 
naturally the second approach. For the first set of parameters, diffusion curve is 
between both simulation results; for the second set of parameters probabilities 
of full queue obtained by diffusion and simulation are very close. In the right 
figure the mean queue length and the corresponding moving average rj, which 
is the argument of RED function d{i]), are displayed for the same two sets of 
parameters (1), (2) as in left figure. Only simulation results are given. It is 
visible that the oscillations observed in previous figures which are due to initial 
conditions, especially the ones for moving average rj, attenuate with time. 

Fig. 0 relates to the evaluation of FECN/BECN mechanism. In left figure 
the throughput given by diffusion model is compared with simulation results; in 
right figure we see the evolution of simulated FECN mean queue for two different 
thresholds compared to the RED mean queue (one of those displayed in Fig. Q- 
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Fig. 3. Queue distribution at t = 30 t.u. and t = 60 t.u., model parameters: thmin = 25, 
thmax = 35, buffer capacity N = 40, initial condition A^(0) = 0, Poisson input, silent 
period = 0, delay = 5 t.u.; logarithmic and linear scale, diffusion and simulation results 





Fig. 4. Mean queue length and changes of flow, diffusion and simulation results, Poisson 
input, parameters as in Fig.[^ 





Fig. 5. Mean queue and changes of flow in simulation and diffusion model, deterministic 
input flow, other parameters as in Fig. 0 
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Fig. 6. Loss as a function of time, parameters as in Fig. 0 diffusion and simulation 
results; logarithmic and linear scale 
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Fig. 7. Long scale performance of RED with Poisson input; left: overall loss ratio as 
a function of delay for two sets of parameters: (1) w — 0.002, Pmax = 0.02 and (2) 
w = 0.2, Pmax = 0.5, silent period = 0, simulation and diffusion results; right: mean 
queues and moving averages as a function of time, the same as in left sets of parameters 
(1) and (2), silent period = 5 t.u., delay = 5 t.u.; simulation results 




Fig. 8. FECN/BECN performance; left: throughput as a function of time, simulation 
and diffusion results; right: comparison of mean queues FECN {D = 5, thr = 30 or 
thr = 35, delay = 0) and RED (w = 0.2, Pmax = 0.5, silent period = delay = 5 t.u.) 
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7 Conclusions 

In this paper, we study with the use of diffusion approximation the impact of 
RED control mechanism on the network performance. Diffusion approximation 
has several natural advantages: flexibility and easiness to analyse transient states, 
to unify separate models into queueing networks, to consider different queue dis- 
ciplines and control mechanisms, and to include in models time- varying input 
streams. Here, it gives us a tool to investigate the dynamics of flow changes in- 
troduced by RED and FECN /BECN mechanisms and to see the influence of pa- 
rameters at RED drop function, of notification delay, etc. It is less costly as simu- 
lation, especially when small probabilities are to be determined. As demonstrate 
numerical examples, it gives reasonable results for considered cases. However, 
one should be aware of made approximations, all related numerical problems 
and the need of careful software implementation. 
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Abstract. We present a simple and scalable admission control algo- 
rithm for improving the Quality of Service (QoS) in Internet Service 
Provider (ISP) networks. The algorithm is novel in that it does not make 
any assumptions regarding the underlying transport technology (works 
at the IP layer), requires simple data structures and is low in operational 
complexity, can handle IP network topology changes efficiently, and can 
help identify congested links in the network. We have verified the work- 
ing of this algorithm by simulation for arbitrary IP network topologies, 
and have found it to be successful in performing admission control and 
identifying congested links after route changes. 



1 Introduction 

IP networks are increasingly being used to transport different types of traffic 
such as voice, video, web and transactions. These different traffic types have 
different QoS requirements from the network, such as IP packet delay, jitter, 
loss and loss distribution. The Internet Engineering Task Force is developing 
the Differentiated Services approach (P) to provide the required QoS to the 
different traffic types, or classes. Briefly, the scalable DiffServ approach involves 
restricting fine granularity conditioning and marking of traffic at network edges, 
and processing traffic aggregates in the network core. The network is assumed to 
be provisioned to support the QoS requirements of each traffic class by assuring 
the necessary resources (bandwidth and buffer space) in the network. 

While the Differentiated Services approach takes care of segregating different 
traffic classes and assuring minimum resources to each, it is also necessary to 
ensure that load within each traffic class remains within bounds such that the 
QoS requirements of that class are satisfied. An admission controller can be used 
to limit the load placed on each class. In the absence of such a mechanism, it 
is possible that the addition of a new flow within a class can result in degraded 
service to existing flows in that class. Additionally, the admission controller 
can be used to identify potentially over-loaded links, which can then be re- 
provisioned to support the higher traffic load. Local information is not sufficient 
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to perform this task efficiently, and complete network resource information is 
required, as illustrated in Figure ^ No capacity is assumed available between 
nodes D and E, while capacity is available between nodes C and D. So flows 
from A to F should be admitted while flows from A to G should be denied. 
Local information at A is not sufficient to make the correct admission control 
decision. It is important to note at this point that some traffic classes such as 
best-effort may not need admission control. 

We present a simple and scalable admission control algorithm for improv- 
ing the Quality of Service (QoS) in ISP networks for traffic classes that need 
an admission control strategy. The algorithm is novel in that it does not make 
any assumptions regarding the underlying transport technology such as ATM 
or MPLS, and works at IP layer. It considers global network resource informa- 
tion in the admission control decision process, and is scalable since it requires 
simple data structures to maintain this information. The algorithm is low in 
operational complexity. Coupled with network feedback, it can handle IP net- 
work route changes efficiently by updating network resource information and can 
help identify congested links. We have verified the working of this algorithm by 
simulation for arbitrary IP network topologies. 




Fig. 1. Local vs. Global Admission Control 



2 The Admission Control Algorithm 

2.1 Assumptions and Terminology 

Figure O illustrates the ISP network assumed in this work. It is composed of 
multiple Point of Presences (PoPs), interconnected by a core network. As has 
been mentioned before, specific link layer technologies are not required in the 
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Fig. 2. Representative ISP IP Network Architecture 



PoP or core. A path is unidirectional and is defined by a pair of PoPs. Thus the 
network comprises of multiple paths, and each path comprises of the sequence of 
links on the path. A link is defined as an IP level hop connecting a pair of routers. 
IP-level connectivity between PoPs is assumed to be provided by a dynamic IP 
routing protocol. Hence IP route changes can result in a change in the set of 
links comprising a path. The ISP is assumed to have defined the traffic classes 
that they plan to support, and provisioned their network accordingly. QoS is 
assumed to be a non-issue within each PoP, and the focus here is on assuring 
QoS between the ISP PoPs. A traffic flow is defined as a stream of IP packets 
between a pair of PoPs. 

The admission control algorithm is intended for use within an ISP network 
(domain). It can potentially reside in an admission controller that processes 
requests from applications such as Voice over IP (VoIP) and IP Virtual Private 
Networks (VPN). For example, in case of VoIP, a Call Agent (|2]) application 
can interface with the admission controller for permission to setup VoIP calls 
between end-hosts. A traffic class for VoIP would be needed in this case. In 
case of IP VPN, the admission controller can reside in an IP VPN network 
management system that receives customer requests for IP VPN setup, and 
determines the feasibility of a request being supported by the ISP’s existing 
network. The request could be in the form of a traffic matrix specifying the 
bandwidth requirements for each class between customer sites. Each site would 
be considered connected to an ISP Point of Presence. A positive decision in 
both the VoIP and IP VPN cases could result in the pertinent edge routers 
being configured to permit the requested flows. The algorithm is designed to be 
used only during the setup phase of a flow. In case of VoIP, this would be while 
the call is being setup, or resources are being reserved for a large number of calls 
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between a pair of PoPs. In case of IP VPN, this would be while a VPN setup 
request is being processed by the IP VPN management system. 

2.2 Data Structures 

The admission control algorithm uses two main data structures, path and link, 
to track network resource information. These data structures are designed to re- 
duce the amount of information required and the processing complexity. Unless 
mentioned otherwise, information is maintained and processed only for those 
classes that require admission control. The path data structure includes infor- 
mation, for each path, about set of links comprising each path, the allocated 
bandwidth for each class, and current status of the path (’’route changed” flag). 
The link data structure includes information, for each link, about the available 
bandwidth on the link (set initially to provisioned bandwidth) for each class, 
and a held indicating the “changed” bandwidth (explained later) on the link for 
each class. 

2.3 Algorithm 

The admission control algorithm is used in processing a request that results in 
the addition or deletion of flows to or from the network. It is thus designed to 
be used on the control path of a flow during the setup phase, and not on the 
data path. Hence IP routers do not need to perform the admission control or 
be aware of such a process. They can focus on data forwarding, in keeping with 
the spirit of the DiffServ approach. The admission control algorithm is described 
below. 



Flow Admission Request 

Input: Pair of PoPs for traffic flow, beuidwidth requirement of 
each class in flow. 

Output: Accept/Reject decision 
Algorithm: 

for each traffic class { 

if (required bandwidth >= available bandwidth for this class 
on any link on path) 

reject request for this class; 

/* can identify link for upgrade/re-provisioning and 
re-run algorithm */ 
else { 

admit this class; 

update path data structure to reflect allocated bandwidth; 
update link data structure to modify available bandwidth 
for each link on path; 

} 

} 
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Flow Release Request 

Input: Pair of PoPs for traffic flow, traffic classes, allocated 
bandwidth for each class. 

Algorithm: 

for each traffic class 

update link data structure to reflect released bandwidth for 
each link on path; 

2.4 Improving the Link Capacity Estimate 

The above algorithm relies on the accuracy of its estimate of link capacity for 
correct admission control decisions. In the presence of route changes, the traffic 
over network links can vary. It is essential for the algorithm to track these changes 
and incorporate the change in available capacity of links in its link data structure. 
The above algorithm can be significantly improved by periodically providing 
feedback from the network about current IP connectivity on all the paths in the 
network (i.e. between each pair of PoPs). This feedback can be generated by 
tools such as traceroute between each pair of PoPs, and sending the IP route 
information to the admission controller. The admission controller can use this 
information to identify changed paths (set “route changed” flag), update the set 
of links comprising each path in the path data structure, recompute the available 
capacity on the links and identify links that may be congested as a result of IP 
route change. At the end of every feedback period, the following algorithm can 
be used to update the data structures as a result of IP route changes. 

Link Capacity Estimation 

for all paths in path data structure { 
if path has changed { 

for all new links on path 

add bandwidth allocated to path to '‘changed" 
bandwidth in link data structure for each class; 
for all deleted links on path 

subtract bandwidth allocated to path from ‘‘changed" 
bandwidth in link data structure for each class; 

} 

} 



The effect of route changes on available link bandwidth is now handled by 
subtracting ’’changed” bandwidth from available bandwidth in the link data 
structure for each class. The links with their “changed” bandwidth exceeding 
the available bandwidth can be considered to be congested. These links can be 
flagged by the admission controller for re-provisioning to handle the additional 




122 



K. Kim et al. 



traffic load. The “changed” bandwidth field provides an indication of the magni- 
tude of change in load on the links during a feedback period. If this functionality 
is not required, the available bandwidth can be directly updated in the link ca- 
pacity estimation algorithm. Otherwise, the following algorithm can be used to 
update the available capacity on the links. 



/* continued from Link Capacity Estimation */ 

for all links in link data structure { 

if ("chcuiged" baindwidth >= available bandwidth) 
flag link as congested; 

subtract '‘changed" baindwidth from available bauidwidth 
and store in available bandwidth in link data structure; 
reset ‘‘changed" bandwidth to zero; 

} 

3 Discussion 

The admission request and release algorithms are both O (average path length 

* number of traffic classes) in complexity. Since the average number of IP hops 
between pairs of PoPs (path length) is typically small (|21), and the number 
of traffic classes that need admission control are also expected to be small to 
ensure proper network resource distribution between the classes, the processing 
overhead imposed by the algorithm is reasonable. The link capacity estimation 
component, which can be used after every feedback period, is O (number of paths 

* average path length * number of traffic classes) in complexity. Also, the algo- 
rithm requires connectivity data from the network for every pair of PoPs. While 
this component can introduce more overhead than the previous ones, the fre- 
quency of the feedback can be reduced if the load on the admission controller or 
other system components is found to be outside acceptable bounds. The feed- 
back frequency should be greater than the frequency of route changes within ISP 
domains, and a feedback period on the order of 15 minutes should typically be 
adequate. 

The admission control process can consider the latest set of links comprising 
a path, and their available bandwidth, due to the network feedback. The link 
data structure does not keep track of specific paths using each link, which sim- 
plifies the flow release process. IP route changes can be handled efficiently since 
path and link information is decoupled into separate data structures. We have 
verified the working of the algorithm using a simulation with arbitrary IP net- 
work topologies. The C-based simulation used multiple pre-generated topologies 
for a simple IP network, with each topology representing the connectivity (and 
consequent route changes) between all pairs of PoPs after failure of a specific set 
of links. Flow admission and deletion requests were generated and processed for 
each PoP pair at the end of each feedback period, representing all the requests 
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generated during that feedback period. The simulation can be run for specified 
number of feedback periods, with a new topology being chosen randomly at the 
end of each period. The algorithm was successful in performing admission con- 
trol and identifying congested links after route changes. We have incorporated 
the algorithm in our Bandwidth Broker prototype, which is described in i> ini 
and p|. The prototype Bandwidth Broker is being used in an experimental IP 
network [ 7 | that is part of the Internetll Qbone, where it interfaces with various 
applications such as VoIP and medical image transfer. 
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Abstract. Core routers cannot provide the generalized and flexible com- 
puting power for supporting the substantial amount of processing needed 
for services on the Internet, where configuration and processing is needed 
per user for a large number of users. In this paper we propose an architec- 
ture for a highly scalable broadband IP services switch which supports 
per user services for a large number of users, provides for isolation be- 
tween users and between groups of users, provides security and authen- 
tication services through tunneling protocols such as L2TP and IPSec 
and supports traffic management through policing and shaping for fair- 
ness among different users. In this architecture the considerable time- 
consuming, repetitive processing is performed in specialized hardware 
while general purpose processing units are used for the computing power 
needed to enable the desired services. Load sharing and load balanc- 
ing is used to distribute the computations between the general-purpose 
processing units. 



1 Introduction 

Historically, quality of service on the Internet has been what is called “best 
effort”. Best effort implies that the network only provides one class of service 
and that all the connections are treated equally when congestion arises. While 
this might have been acceptable for traditional Internet applications, it is highly 
inadequate for new real-time applications such as voice, video, webconferencing 
and other interactive applications. Yet these applications provide the greatest 
potential for the growth of services on the Internet. 

To support such services, network nodes must support differentiated per-user 
and per-packet processing. However, while the high-bandwidth core routers that 
are currently under deployment are optimized for performing large numbers of 
fast routing lookups, they are not expected to provide generalized and flexible 
computing power for supporting the substantial amount of processing needed 
for, among other things, per user and per packet processing. 

In order to emulate the services that subscribers currently avail of from dif- 
ferent networks, and in order to enhance those offerings on a single ubiquitous 
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network, a network infrastructure needs to support the following general require- 
ments. 

1. Quality of Service: In order to support real-time applications, the Internet 
must provide differentiated quality of service (QoS). The Diffserv working 
group of IETF has defined the general architecture for differentiated ser- 
vices focusing on the definition and standardization of per-hop forwarding 
behaviors (PHBs) p. Network routers that provide differentiated services 
enhancements to IP must use the diffserv codepoint (DSCP) in the IP header 
to select a PHB for the forwarding treatment of that packet. In addition, 
packets that do not carry the diffserv codepoint but require differentiated 
service must first be marked with the appropriate DSCP before being for- 
warded. Marking of these packets is achieved using a classifier which must 
look up layer-3 and layer-4 headers in the packets. In order to support end- 
to-end QoS, it is almost mandatory to support multi-protocol label switching 
(MPLS) 0 and traffic engineering extensions to OSPF or IS-IS, which carry 
topological information necessary for traffic engineering. 

2. Multi-protocol support: There are competing “last mile” technologies to- 
day which provide transport to the user for delivering packets to the “edge” 
of the Internet. To complete the communication, these packets need to be 
formated to allow them to enter the Internet cloud and find their way to their 
respective destinations. The emergence of supporting protocols for new appli- 
cations and the growth spurt in number of users and the required bandwidth 
per user results in a very dynamic access environment. A network node must 
be capable of supporting all the requisite protocols in order to accommodate 
a variety of source packets. 

3. Privacy and security: Unlike Frame Relay, ATM networks, and private 
leased lines, IP networks do not establish a physical or logical circuit between 
the end users. Consequently, IP networks are prone to a myriad of security 
attacks, some of which have been well publicized in recent years. Fortunately 
this issue has received a great deal of attention from the research community, 
the standardization organizations, and across the industry. A number of IP 
services can be used to enhance the security and privacy across the Internet. 
These include encryption, firewalls, virtual private networks and network 
address translation. Virtual private network (VPN) services allow a private 
network to be configured within a public network. VPN’s rely on some form 
of tunneling to create an overlay across the IP network by constructing vir- 
tual links between the end points of the tunnel. An IP packet is encapsulated 
in the header of another protocol (e.g., IP) and transported across the net- 
work using the forwarding mechanism of that protocol and independently of 
the address field of the encapsulated packet. A number of IP tunneling pro- 
tocols have been introduced in recent years including IP/IP, IP/GRE, L2TP 
0, IP/IPSec 0 II and MPLS 0. To allow for VPN’s to coexist within 
the public network, such tunneling protocols must be efficiently supported 
within the network nodes. Network address translation (NAT) and network 
address port translation (NAPT) allow for IP level access between hosts in 
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a (internal) network and the Internet without requiring the hosts to have 
globally unique IP addresses. They also increase privacy and security as the 
addresses of the hosts on the internal network are hidden from the public 
network. 

4. Traffic management: When users are allowed access at high speeds, it is 
possible for a limited number of users demanding disproportionate amounts 
of bandwidth to disrupt service to other users. To ensure that large traffic 
bursts do not overload small users buffers, and to ensure fairness among 
different users, traffic policing and shaping must be implemented. 

5. Secure segmentation: Isolation between users and between groups of users 
must be provided. 

6. On-Demand Subscription: On-demand subscription, provisioning, and 
activation of IP services allows the enterprise customer to respond quickly to 
unforeseen business needs without costly delays. This requires the dynamic 
provisioning of guaranteed QoS, privacy and security features across the 
network and to the enterprise customer’s site. 

7. Authentication, Authorization, and Accounting: Guaranteed, secure, 
and reliable service requires that the users be authenticated. This implies 
validating the identity of the users at call setup. Furthermore, authorization 
is required in order to determine the policy-based services the user can use. 
Finally the user’s connection duration and type of service must be logged 
for accounting and billing purposes. 

8. Bandwidth and Latency: Finally, to enable the new services, large band- 
width and low latencies are critical. 

In this paper we first describe the traditional approach to provisioning of IP 
services in the Internet. Next we propose a new architecture for an IP services 
switch which satisfies all the requirements listed above. To meet these require- 
ments, our proposed architecture is scalable not only in bandwidth, but also in 
processing power. 

2 CPE-Based IP Services 

The most common form of IP services deployed today uses customer premises 
equipment (CPE) devices at the customer site. Figure ^below shows an example 
of IP services deployment using CPE devices. 

The CPE devices perform customer routing, firewalls, encryption, tunnel gen- 
eration and termination, as well as network address translation and some form 
of bandwidth management . While some form of bandwidth management device 
can be installed at customer premises, there is no end-to-end QoS provisioning. 
Once the traffic exits the customer’s network and enters the WAN, the QoS it 
receives is undetermined. 

Depending on the customer’s desired level of control and trust, in-house ex- 
pertise, and the cost of new applications, hardware, management and upgrades, 
the SP provides services from pure connectivity to full installation, configura- 
tion and management of these services. However, the general trend among the 
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enterprise customers has been to fully outsource these services. In addition to 
the fact that some services simply cannot be provided (e.g., QoS, on-demand 
subscription), the CPE-based approach has several economic and operational 
drawbacks, some of which are listed below. 

1. Expensive up-front capital investment is required that must be shared by 
the enterprise customer and the service provider depending on their arrange- 
ment. In addition, there are significant operational expenses associated with 
this approach. Deployment of CPE-based services requires a long lead-time. 
Service preparation involves an initial consultation to identify the customer’s 
needs, ordering and pre-configuring the CPE devices, and shipment. This is 
then followed by service rollout, which involves an on site visit by techni- 
cians to install and integrate the devices. Maintenance and support of these 
services put a large burden on service providers. They must maintain an 
inventory of spares and a team of technicians to dispatch to the customer’s 
premises when failures occur or upgrades are needed. For each upgrade or 
repair the service is temporarily interrupted; a situation that may not be 
acceptable to the customers since it implies loss of secure connections. 

2. Addition of new services and functions requires additional hardware and/or 
software, which involve additional cost as well as costly staff trips to the site. 

3. In many cases, an enterprise has several sites, all of which require enhanced 
IP services. However, it is often too expensive and cumbersome to deploy 
these services at every site. Instead, services are deployed at one or two 
points of access into the public network. The traffic from the other sites 
is then backhauled to these sites and then onto the Internet resulting in 
increased traffic on the corporate network. 

4. Since no QoS is provisioned across network nodes, network-wide service level 
agreements cannot be implemented and supported. 

5. Since installation, management, and support is not centralized and must be 
per customer site, it is extremely difficult to scale this approach to a large 
number of customers. 

3 Network-Based IP Services 

A new approach to managed IP services is emerging in which these services are 
moved from the customer premises into the service provider’s (SP’s) realm of 
control (Figure EJ. In this new paradigm, the intelligence of managed services 
resides in a reliable broadband aggregation and IP service creation switch, which 
is deployed at the provider’s point of presence (POP) replacing the high-capacity 
access concentration router. Hence an IP service creation switch is a broadband 
aggregation router, which is designed to enable value-added services in the public 
network. As such, it must be highly scalable in speed as well as processing power 
and it must have carrier-class reliability. To serve a large metropolitan area with 
a single IP service creation switch, it must be scalable to a large number of 
interfaces. These interfaces would be needed to handle the multiple sessions that 
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can originate in a single household or business as well as those originating from 
wireless IP enabled devices. 

Deployment of IP service platforms provides a great deal of intelligence in 
the periphery of the network and eliminates the problems associated with the 
CPE-based approach. In particular, 

1. Service provisioning, policy configuration, management, and support can all 
be provided centrally from the SP’s site. 

2. Services can be rolled out by the SP at each customer site (rather than at 
one or two sites) rapidly and cost effectively. 

3. The centralized management allows the service providers to monitor per- 
formance against their service level agreement (SLA) commitments thereby 
enabling them to deliver on those commitments. 

4. Enterprise customers can initiate and configure new services through a browser 
interface and obtain detailed network and service performance information 
through a browser-based interface and verify that their SLA is maintained. 

5. The ease of deploying services and the ability to provision end-to-end QoS 
makes viable new business models such as the Application Service Provider 
(ASP) model. 

In the following section we describe our proposed architecture for such an IP 
service creation switch. 



3.1 Proposed Architecture 

Our discussion of the previous sections makes it clear that the IP service creation 
switches are different from the high bandwidth core routers in that in addition 
to performing time consuming, repetitive processing for large numbers of routing 
lookups required of core routers, they must also provide scalable, generalized and 
flexible computing power which must be easy to program for, among other things, 
per-user and per-packet processing. Our architectural philosophy maintains a 
balance between these two types of processing. The need for considerable time- 
consuming repetitive processing, which has proved to create a bottleneck in the 
traditional routers, is addressed through specialized hardware. This results in 
dramatic increases in speed and reductions in delay. On the other hand, the 
need for flexible, easy to use computing power to enable IP services is addressed 
through the provision of high-performance general purpose processors which are 
paralleled and can be scaled to a virtually limitless degree. This architecture 
is designed to provide scalability in speed/bandwidth, state-space/memory, and 
processing power. 

Figure 0 shows a block diagram of our proposed architecture. Packets enter 
and exit the switch through media specific physical connection (PHY) cards. The 
packet entering a PHY card is delivered to the ingress side of its associated line 
card. After some initial processing, the line card distributes the received packets 
to a particular Internet Processor Engine (IPE) card through the switch fabric. 
After performing the necessary processing, the IPE card sends the packet back 
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through the switch fabric to the egress side of one of the line cards for further 
processing before allowing the packet to exit the system from the associated 
PHY card. 




Fig. 3. The system architecture overview. 



All the line cards contain identical hardware, but are independently pro- 
grammable. Similarly all the IPE cards have identical hardware, but are inde- 
pendently programmable. This makes for a simple design and contributes to the 
scalability of the architecture. If additional processing power is needed, addi- 
tional IPE cards can be added. Additional users can be supported by adding 
more line cards and IPE cards. 

In general, each line card performs a number of functions. Initially, the line 
card ingress converts the variable-length input packets into a number of 64- 
byte fixed-length cells. The stream of cells are examined “on the fly” to obtain 
important control information including the protocol encapsulation sequence 
for each packet and those portions of the packet which should be captured for 
processing. The control information is then used to reassemble the packet and 
to format it into a limited number of protocol types supported by the IPE 
cards. Thus, while any given line card can be configured to support packets 
having a number of protocol layers and protocol encapsulation sequences, the 
line card is configured to convert these packets into generally non-encapsulated 
packets of a type that is supported by each of the IPE cards. The line card sends 
the reassembled and reformatted packet into the switch fabric (in the form of 
continuous fixed length cells) for delivery to the IPE card that was designated 
for further processing of that packet. 

Although the fixed-length cells which comprise a packet are arranged back 
to back when sent across the switch fabric, the cells may become interleaved 
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with other cells destined for the same IPE card during the course of traversing 
the switch fabric. As a result, the cell stream delivered to an IPE card is in 
fact an interleaved cell stream. Thus the IPE card will first examine this cell 
stream “on the fly” (much like the line card) to ascertain important control 
information. The IPE card then processes this information to perform routing 
lookup as well as other mid-network processing functions such as policing, packet 
filtering, PHB scheduling, etc. The control information is also used by the IPE 
card to reassemble the packet according to the packet’s destination interface. 
The IPE card then sends the reassembled and reformatted packet back into the 
switch fabric for delivery to the egress side of one of the line cards (or to one of 
the IPE cards if the packet requires additional processing) . 

On the egress side, the line card will again examine the interleaved cell stream 
and extract the necessary control information. The control information is then 
used to reassemble and format the packets for their destination interfaces. Addi- 
tional processing of the outbound packets such as shaping and PHY scheduling 
is also performed on the line card egress side. 

3.2 Data and Control Path 

The line cards (on the ingress and egress sides) and the IPE cards host a flexible 
protocol-processing platform. This platform is comprised of the data path pro- 
cessing and the protocol path processing. The separation of data path processing 
from protocol processing leads to the separation of memory and compute inten- 
sive applications from the flexible protocol processing requirements. A clearly 
defined interface in the form of dual port memory modules and data struc- 
tures containing protocol specific information allows the deployment of general- 
purpose central processing units for the protocol processing units (PPU). These 
PPU’s support the ever changing requirements of packet forwarding based on 
multi-layer protocol layers. 

Figure 21 illustrates the protocol processing platform. The protocol path pro- 
cessing consists of a number of PPU’s and can be configured for multiple purposes 
and environments. One of these PPU’s is reserved for maintenance and control 
purposes and is denoted as Master PPU (MPPU). 

The data path processing unit, which is implemented in specialized hardware, 
consists of the packet inspector, the buffer access controller and the packet man- 
ager. This unit extracts, in the packet inspector, all necessary information from 
the received packets and passes this information on to a selected PPU via the 
buffer access controller. Furthermore, the packet inspector segments the variable 
length packet into 64- byte fixed- length cells. The cells are stored in the cell buffer 
and linked together as linked lists of cells. Once a PPU has selected a packet 
for transmission it passes the pointer to the packet as well as the necessary for- 
matting and routing information to the data path processing unit. This enables 
the formatting and the segmenting of the packet. The packet is then forwarded 
either as a whole (from the line card egress to PHY card) or segmented (from 
the line card ingress to IPE card and from the IPE card to line card egress) 
based on the configured interface. 
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Fig. 4. Protocol processing platform on the line cards and IPE cards. 



The set of all ingress users on the system is distributed as evenly as pos- 
sible across all the IPE cards in the system. This is achieved by the line card 
which distributes the received packets based on user or tunnel information to a 
particular IPE card (Figure 13). Within an IPE, the MPPU stores the per-user 




Fig. 5. Load sharing between the IPE cards. 



information for the users assigned to that IPE and distributes those users across 
all the PPU’s on that IPE (Figure E|). Each PPU stores a copy of the per-user 
information assigned to it. Thus each user is associated with one and only one 
IPE card and one and only one PPU on that IPE. This ensures that packet order 
is maintained. The procedure of forwarding a packet to a particular IPE card 
and PPU is denoted as load sharing. The key benefits of load sharing are the 
following. 
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1. Separation of the data path from the protocol processing path. 

2. Incremental provisioning of compute power per packet. 

3. Load distribution based on the packet computational needs for a particular 
user or tunnel. 

4. User and tunnel information can be maintained by a single processor thus 

(a) minimizing the inter-process communication needs, 

(b) allowing the portability of single processor application software onto the 
system. 




Fig. 6. Processing flow. 



3.3 Scheduling and Bandwidth Distribution 

As mentioned previously, the set of ingress users are distributed as evenly as 
possible across all the IPE cards in the system. The forwarding operation is then 
performed on the IPE cards where each packet is forwarded to an appropriate 
line card egress (for further transmission to the PHY port) depending on the 
IP destination address of the packet. Given the bursty nature of the Internet 
traffic, it is difficult to predict the traffic load from IPE cards to line cards. In 
particular, over short periods of time, the total traffic from IPE cards destined 
to a line card may exceed the capacity of the corresponding switch fabric link. 
In the absence of a scheduling mechanism, the transmitted cells will be dropped 
by the switch fabric. To prevent this, we have devised a scheduling algorithm 
which, in response to the requests from line cards and IPE cards, distributes the 
switch fabric link bandwidths to them. This algorithm is referred to as bandwidth 
distribution in the following. 

Bandwidth distribution is performed on one of the MPPUs which is des- 
ignated as master MPPU. Time is divided into intervals called cycles. At the 
beginning of each cycle each card transmits its request or demand (determined 
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by the amount of traffic in its buffers) to the master MPPU. To support differ- 
entiated quality of service we assume that traffic consists of P different priority 
classes and that transmitted requests from the cards are per-priority class. Mas- 
ter MPPU computes a per-priority grant for each card and transmits these grants 
to the cards. Each card can then transmit traffic of each priority class up to the 
amount of grant it has received for that class. This scheme resolves contentions 
within a cycle; in principle, all the cells that are scheduled for a given cycle 
should be able to reach their destination cards by the end of that cycle. The 
algorithm is described next. 

Let t denote time in terms of the number of cycles and let dij{t,p), i,j = 
1,2, - ■ ■ , N, p = 1, 2, • • • , P, denote the demand of priority class p from card i 
for card j, where N is the total number of cards. The dij{t,p)’s are calculated 
based on the buffer occupancies on each card. At the end of each cycle, each 
card transmits its demand to the master MPPU. The master MPPU forms P 
demand matrices D{t,p) = [dij{t,p)\, p= 1, 2, • • • , P, from which it calculates P 
grant matrices G{t+ l,p) = [gij{t+ l,p)], where gij{t,p) denotes the bandwidth 
grant for traffic class p assigned to card i and destined to card j during the cycle 
t. 

Grants are allocated based on strict order of priority. The bandwidth distri- 
bution algorithm tries to fulfill the demands of the highest priority class first. 
If any additional bandwidth is left on any of the input or output links, it is 
assigned to the next priority class according to their demands. In the following 
we describe an algorithm for computing the grant matrices. To simplify nota- 
tion we remove the time and priority indices of the request and grant matrices. 
Furthermore each switch fabric port is assumed to have a capacity of C cells/sec. 

Bandwidth distribution algorithm 

Step 0 

Given the demand matrix D, calculate the total demand of every row and column 
denoted by dij and vf = dij, respectively. Galculate ao, where 



ao 



C C 

= min{ *= 1,2,...}. 

u) v) 



( 1 ) 



Each input port can be assigned a bandwidth of ag for each output port with- 
out violating the capacity constraint of any input or output link. Suppose the 
minimum is achieved for row k. Then ag = C/u"^\ For j = 1,2, ■ ■ ■ , N , let 
gkj = cLodkj- Row k (input link k) has reached its maximum capacity now. It 
will be eliminated from further consideration. The situation is similar if the min- 
imum is achieved for a column. All other input/output demands will receive the 
same amount of bandwidth at this stage. 

Step I, 1 < I < 2N — 1 

Up to this point I row or columns have been eliminated. Galculate the unfulfilled 
demands for every row and column. If a row, say row m, was eliminated in step 
I — 1, then = uf and = vf — dmi, i = 1,2, • • • ,iV. Similarly, if a 
column, say column n, was eliminated in step I — 1, then uf ^ = uf — dm and 
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2, • • • , N. Calculate the residual bandwidth on each row and 

column 



(/) (/-l) (i-l) (Z) (Z-l) (Z-l) 

where = C for all i. Now evaluate the maximum bandwidth increment, 



gin 

di min{ ^ ^ ,i 1,2,...,} 

u) ' v) ' 



(2) 



and allocate bandwidth X)t=o proportionally to all the requests of the row or 
column that achieves the minimum, i.e., if the minimum is achieved for, say, row 
r, then g^j = {Y!t=o at)drj for all j. 

If only one row or column is left, we allocate all the residual capacity to the 
requests in that row or column keeping in mind the capacity constraint of the 
switch fabric input and output ports. 

The above algorithm has the following interesting properties. 

1. Except for the last step of the algorithm, all input and output ports corre- 
sponding to the rows and columns which are eliminated have 100% utiliza- 
tion. 

2. The algorithm is fair in the following sense. Except for the input/output 
pairs which receive bandwidth allocation in the last step, in every step of 
the algorithm the grant allocated to the input/output pairs is proportional 
to their requests. This property does not hold for the last row or column since 
in that case all the residual capacity is assigned to the remaining requests. 

3. For a,n N X N matrix, there are at least N and at most 2N iterations. It can 
be shown that this algorithm requires 0{N'^) additions and multiplications. 
This is the lowest complexity that any centralized algorithm can achieve. 



Credits and grants 

If grants are generated in direct response to demands from cards, then each 
cell must undergo a latency of at least one cycle. In general the length of a cycle 
(on the order of msec.) is too large compared to the delay tolerance of real-time 
traffic. In order to eliminate this latency we issue every card a fixed credit line 
of Lp cells for the priority class p. A card that has no grants can transmit up 
to Lp cells from priority class p without having obtained any grants for it. The 
algorithm is described in the following. 

Consider priority class p and cycle t. Let Bij(t,p) and Cij{t,p) denote the 
buffer occupancy and the credit of card i for card j at the end of cycle t, respec- 
tively. Also let Tij{t,p) and Aij{t,p) denote the number of cells transmitted and 
the number of cells that arrived, respectively, during this cycle. Then the buffer 
occupancy of card i for card j at the end of cycle t is given by 



Bij{t,p) =maxO,Sy(t- l,p) -T^{t,p) + A^^{t,p). (3) 

Furthermore, the remaining credit at the end of cycle t is given by 

Cij{t,p) =Toin{Lp,Cij{t- l,p) -T,j{t,p) + gij{t,p)}. (4) 
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The number of cells that can be transmitted during a cycle t is then given by 

Tij{t,p) = min{cij(t - - l,p).} (5) 

The new demand is now calculated such that if granted, the card will have a 
credit equal to Lp plus the traffic remaining in its queue from the previous cycle, 
i.e., 

dtj{t,p) = Lp + Bij(t,p) - Cij{t,p). (6) 

The initial conditions are given by 

diji^iP^ — — djp. 

Having obtained its grant gij{t,p) from the master MPPU, card i computes 
its new credit for the next cycle. It then transmits traffic up to the level of its 
credit from its buffers. 

We have performed a set of simulations in order to measure throughput and 
delay in different parts of the system and to evaluate the effectiveness of our 
approach in reducing the switch fabric contention. Our results show that this 
approach achieves nearly 100% throughput and can easily support differentiated 
quality of service in terms latency for different classes of traffic |2|. 

4 Conclusion 

In this article, we present an architecture that meets the stringent requirements 
of an IP service creation switch. The proposed architecture enables the support 
of user management, tunnel management, logical link management, per-user 
policing, per-user traffic shaping, per-user QoS control with Diffserv support, 
per-user buffer management, per-user packet classification, packet filtering, NAT, 
and management database support, all for a very large number of users. The 
simultaneous control plane support of t he requisite protocols such as the routing 
protocols, the tunneling protocols and the QoS protocols makes this a very well- 
matched architecture for a scalable IP services switch. 
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Abstract. We address the problem of maintaining logical time in a dis- 
tributed system operating over a special type of a network: one whose 
underlying graph is represented by fully-connected mesh of star-topology 
subgraphs. Such a graph is generated, for example, by a wireless IP net- 
work in which mobility service agents, interconnected by an IP backbone, 
provide wireless connectivity to mobile hosts currently present in the as- 
sociated location areas. We discuss the complementary logical clocks for 
the wireline and wireless segment of the network and their integration 
into an isomorphic logical time system. 



1 Introduction 

Tracking causal relations between activities occurring in distinct physical loca- 
tions plays a critical role in monitoring, analyzing and debugging the behavior 
of distributed systems. Yet understanding of a distributed execution, detecting 
causality or concurrency {i.e., causal independence) between local events and 
accounting for the corresponding non-determinism presents a formidable engi- 
neering challenge. The task becomes notoriously difficult as the complexity of 
the underlying communication network and the number of potentially interact- 
ing sites grows. 

Logical time provides a standard tool to characterize causality in a dis- 
tributed system. A logical clock protocol, that runs on top of the regular message 
passing communication, assigns a timestamp to each event. Comparing the times- 
tamps allows to draw conclusions (deterministically, with quantitative probabil- 
ity, or in some likelihood) about the existence of causal relation between the 
events. The simplest causal relation, the relation happened-before, which has 
been defined for the processes with the totally ordered events P|) requires vector 
clock to characterize causality in an isomorphic fashion m- This mechanism 
employs event timestamps and message tags represented by integer vectors whose 
size is equal to the number of processes in the system P]. For many applications, 
such a cost appears intolerably high. As a response to the overhead problem, the 
research activities have focused on efficient encoding schemes p], the trade-offs 
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between online and offline communications m , and the trade-offs between clock 
accuracy and complexity |8I9| . In a parallel development, a generalized approach 
to causality has been sought with the objective to extend the methodology to 
the processes with the partially ordered events imim as well as to the special 
types of processes and underlying networks IHOl. 

In the present work, we focus on the isomorphic tracking of causality in a 
large distributed system running over an IP network with the wireless mobile 
access segment. In such a network, a mobility service agent (MSA) provides 
wireless access over a radio or infrared links to a group of mobile users (nodes) 
in its location area, or subnet. The MSA’s themselves are interconnected by a 
fixed wireline network. A mobile node can communicate with the other nodes, 
both mobile and static, exclusively via the MSA of its current location area. 
When a node moves from one location area to another, the link with the first 
MSA is brought down and a new link with the MSA of the second location area 
is established. The model is applicable to the advanced wide-area wireless sys- 
tems, as well as to corporate wireless LANs. Depending on the actual operating 
environment, the MSA function can be represented by a base station, a wireless 
router, a radio network controller, a packet radio support node, or a packet data 
service node. 

In a distributed system running over such a network, an MSA with the mo- 
bile nodes logically linked to it may be viewed as an abstracted process with a 
partially ordered event set. The overall task of characterizing causality can be 
partitioned into the smaller tasks applied to the set of abstract processes, on one 
hand, and a set of the mobile nodes of the same location area, on the other hand. 
According to this partition, we define a system model and introduce a refined 
definition of the causal precedence relation for a distributed system containing 
mobile nodes interacting via mobility service agents, develop two separate logical 
clocks for the wireline and wireless segments of the network, and show how they 
interact to form an integrated isomorphic logical time system. 

2 System Model 

Fixed and Mobile Network Segments 

We consider a distributed system containing interacting processes running at 
mobile nodes (MN) and mobility service agents (MSA), as shown in Fig.O When 
an MN moves into a location area associated with a specific IP subnet, it uses 
a registration mechanism to establish its presence with the MSA of this subnet 
and to obtain a temporary care-of IP address PI- All communications within 
a subnet are carried over a set of radio or infrared links between the MSA and 
mobile nodes. The MSA, which is connected to a routing node of an underlying 
IP network of arbitrary topology (shown with the dash lines in the Figure), 
handles all incoming and outgoing traffic of a mobile node. Specific network 
topology as well as the care-of address assignment and tunneling mechanisms 
remain fully transparent to the mobile applications. The set of mobility service 
agents is modeled as a vertex set of a fully connected graph. Each edge of the 
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graph represents a full-duplex, non-FIFO channel with a non-zero probability of 
a message loss (solid lines). 




Fig. 1. Distributed system running over a mobile IP network: MN - mobile node, MSA 
- stationary mobility service agent. 



Each subnet containing a mobility service agent and a number of mobile 
nodes is modeled as a star-topology graph, where each edge corresponds to an 
error-free full-duplex FIFO channel with a non-zero probability of message loss 
in either direction. The latter property reflects possible link impairment due to 
landscape features or foreign objects moving into the signal propagation path. 

In order to transmit and receive messages, a mobile node has to be registered 
with an MSA. At most one registration can be valid at any given time. When a 
mobile node moves from one subnet to another, its link to the mobility agent of 
the first subnet is brought down, and a new link with the mobility agent of the 
second subnet is established. In the course of such transition, a mobile node may 
have to obtain a new care-of IP address. The details of this mechanism including 
the signaling message exchange also remain transparent to the applications. 



Event Types and Causality Relations 

A process in a distributed system is the program code running either at a mobile 
node or a mobility service agent. Any change in a process state constitutes 
an event. Without loss of generality, we further consider only communication 
events, i.e., ones associated with sending or receiving an information message. 
Events occurring at any given process (MN or MSA) are proper events of that 
process. Proper events of any process are sequentially ordered. An example of a 
distributed system with communication events is shown in Fig. Q 

An event occurring at an MN is either a message send event (NSEND) or a 
message receive event (NRECV). Registration represents a special type of event 
that involves an exchange of signaling messages between the node and mobility 
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Fig. 2. An example of communication in a system with three mobile nodes and two 
mobility service agents. 



service agents. Events occurring at an MSA are classified as either MSA-IN, 
MSA-OUT, or MS A- RELAY. An MSA-IN event is atomically composed of re- 
ceiving a message over the fixed part of the network and forwarding it to an 
MN in the location area of the given MSA. Similarly, an MSA-OUT event is 
composed of receiving a message from an MN over a wireless link and forward- 
ing it over the fixed network, whereas an MSA-RELAY event is a combination 
of receiving a message from one MN and forwarding it to another MN in the 
agent’s location area. 

Definition 1 (Elementary internal cansality) For a given mobility service 
agent A, elementary internal causal relation exists between events e and f, if 
one of the following conditions holds: 

- events e and f belong the same mobile node P, e precedes f in the ordered 
sequence of P’s events, and P is registered with A at occurrence of event e 
as well as at occurrence of event f ; 

- event e is a NSEND event occurring at a mobile node while this node is 
registered with A and f is the corresponding MSA-OUT or MSA-RELAY 
event of A; 

- event e is an MSA-IN or MSA-RELAY event of A and f is the corresponding 
NRECV of a mobile node occurring while this node is registered with A. 

Definition 2 (Internal causality) The internal causal relation is the transi- 
tive closure of the elementary internal causal relation. The internal causality 
with respect to a given mobility service agent A between events e and f is de- 
noted 6'^ f . 

Definition 3 (External causality) The external causal relation exists between 
events e and f, if e is a MSA-OUT event occurring at one MSA and f is the 
corresponding MSA-IN event occurring at another MSA. 

Definition 4 (Causal precedence) The causal precedence relation on the set 
E of events in a distributed system is a transitive closure of the union of internal 
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and external causal relations. The causal precedence between events e and f is 
denoted e ^f- 

In Fig. El the elementary internal causality is shown with thin straight arrows, 
arches denote the internal causality, and thick arrows mark the external causality. 

3 Logical Time Protocol 

Wireline Segment 

Consider a distributed system formed by interacting processes that correspond 
to the set of the mobility service agents and assume that the respective inter- 
nal causal relations are known a priori (before the execution of the logical time 
protocol). Such an abstraction results in a general process model with the par- 
tially ordered event sets. The needs of that model are conventionally met by the 
bit-matrix (BM) logical time system EDI. 

For each event in the system containing N processes, the BM clock assigns 
a timestamp of N components. Each component itself is a vector of bits, size of 
which (ignoring the trailing zeros) varies with the number of events in the cor- 
responding process. The process initialization event receives a timestamp con- 
taining all-zero vector components. On occurrence of event e^y, i.e., 1-th event 
in process Ak, its BM timestamp is found by computing bit-wise OR over the 
timestamps of all events Ckj, such that ^ in addition the z-th bit of 
the proper bit- vector is set to 1. Each message in the system carries the time 
tag equal to the timestamp assigned to the message send event. On receipt of a 
message, the BM timestamp of the receive event is computed as a bit-wise 
OR over the timestamps of all internally precedent events Ckj as well as 

the timetag of the message; in addition the z-th bit of the proper component is 
set to 1. Timestamp T(e) of any event e, assigned by the BM clock, is complete 
with respect to the causal past Past{e), since a ^ e implies T{e)\a = 1. Fur- 
thermore, as the opposite is true as well, the BM timestamp T{e) is indicative: 
a € Past{e) 4=^ T{e)\a = 1. 

The substantial disadvantage of the BM clock lies in its communication com- 
plexity. Two approaches increasing the feasibility of the method have been con- 
sidered in EH. The idea of the dependency sequence timestamp compression 
technique is to reduce the overhead on average, while maintaining the essen- 
tial information content of the BM timestamps. The two-tier hierarchical clock 
transmits less information on-line, but provides for the restoration of indicative 
event timestamps by means of off-line query exchange. The efficiency of the lat- 
ter algorithm depends on the presumed knowledge by the MSA of all causal 
dependencies between events of the registered mobile nodes, including those de- 
pendencies whose causality chains contain external causal links. If the MSA’s a 
priori knowledge is restricted to the internal causal relations, as defined above, 
the number of the off-line queries required by the two-tier hierarchical clock 
for the indicative timestamp restoration in the worst case reaches the order of 
transmitted information messages ES|. 
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An efficient alternative to the bit-matrix clock is offered by the compact bit- 
matrix clock (CBM) mechanism. The underlying idea of CBM is to minimize the 
weight (Le., the number of bits set to one) of the timestamps assigned on-line, 
while retaining the possibility to restore the indicative timestamp of any event 
off-line using not more than N queries, where N is the number of processes in 
the system. The sparse bit-matrices used to tag the messages in CBM allow 
more efficient compression (with any known lossless technique, of which the 
dependency sequences is one example) and hence incur significantly smaller on- 
line communication overhead. The theory of compact bit-matrix clock is based 
on the following result. Let local causal past Pastime) be the subset of Past{e) 
containing all events that occur at the abstracted process Ai. 

Proposition 1 For any given event, the off-line restoration of an indicative 
timestamp requires not more than N queries, if for any event e oecurring at 
proeess Ai, the timestamp T{e) assigned on-line 

- is a valid partial timestamp; 

- is complete with respect to Pastime), and 

- for each j i, is complete with respect to maximal seQ Max{Pastj{e),'A'). 
In CBM, a message tag does not coincide with the timestamp of the correspond- 
ing message send event. In its basic form, a CBM message tag carries the proper 
bit-vector that contains just a single bit referring to the send event itself. In 
addition, maintaining an absorption matrix allows to exclude from message tags 
any redundant causal past references that can be inferred by the recipient. 



Wireless Segment 

The logical time protocol proposed for the wireless segment of the network adapts 
the well-known isomorphic vector clock to the wireless environment. First, since 
the membership in the group of mobile nodes within the service area of a given 
MSA changes over time, maintaining a logical time vector at each mobile node 
may not be easily achievable. Second, since mobile nodes conduct their commu- 
nication via the mobility service agent, it is natural to delegate handling of the 
vector time to the MSA, thus reducing the overhead on the wireless links, where 
bandwidth is a critical resource. Third, since the wireless link is, by definition, 
lossy, the means to account for possible undelivered messages in either direction 
should be incorporated into the protocol. 

The centralized management of the vector time employs the MSA events as 
representative images of the mobile node events. Accordingly, the vector times- 
tamps assigned to MSA events may reflect the causal relations between the cor- 
responding mobile node events. We refer to this type of causal relations between 
events of the same MSA as induced causality. There are two types of induced 
causal relations: 

^ By definition, the maximal set of S with respect to an order relation Max{S,A), 
contains all x £ S, such that no other element in S is strictly superior to x. 
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- an MSA-OUT (MSA-RELAY) event causally precedes a subsequent MSA- 
OUT (MSA-RELAY) as long as both associated messages originate at the 
same mobile node registered with the given MSA; 

- an acknowledged MSA-IN (MSA-RELAY) event causally precedes a subse- 
quent MSA-IN (MSA-RELAY) event as long as both associated messages are 
destined to the same mobile node registered with the given MSA and both 
messages are successfully received by that node. 

As a message transmitted by an MSA to the mobile node may be lost while 
in transit, the induced causality of the latter type can not be established at 
the time of the second MSA-IN event occurrence. Instead it is detected upon 
occurrence of a subsequent MSA-OUT event that acknowledges the receipt of 
the message. The wireless segment protocol accounts for the induced causality 
in order to simplify process of determining causal precedence between mobile 
events. 

The wireless segment protocol is outlined in Fig. 01 El Class Rmsg_t refers to 
the wireless message overhead, which contains the following fields: locid is the 
sender’s event id, baseid is the id of the most recent acknowledged message from 
the MSA to the mobile node, and bit- vector list [ ] is the indicator of the set 
of MSA messages received by the mobile node. In Fig0 bit- vector Lpast [ ] is 
the indicator of the local past maximal set, extracted from the received external 
message. 

Hand- Off Procedure 

When a mobile node moves from one location area to another or powers up after 
changing location, the hand-off procedure is executed. In addition to the new 
IP address registration, the hand-off requires an exchange of signaling messages 
between the MSA’s of two location areas. 

First assume that the departing MN is able to inform its old MSA of a 
pending hand-off. Then the notification message carries the overhead fields of 
a regular HSEND event. The MSA handles its receipt as a MSA-OUT event, 
establishing the internal causal relations, forming the compact bit-matrix tag, 
and forwarding the hand-off request to the mobility agent with which the node is 
going to register. In addition to the CBM tag, the request carries the MN’s event 
id. When the new MSA receives the request, it uses the CBM tag as a reference 
into its external history to compute the local vector time. It then assigns a new 
local node id and performs initialization: msgidvect component is set equal to 
the event id passed with the request, whereas basemsgvect component is set to 
0, indicating the hand-off event itself. No protocol information needs to be sent 
by the MSA to the mobile node, as any message received later from the MN 
acknowledges the hand-off by default. 

Alternatively, the hand-off can be executed without prior notification of the 
old MSA. Upon receipt of a message from a previously unregistered node, the 
MSA of a new location area puts the message on hold while forwarding the 
request with original message tag to the MN’s old MSA. The old MSA processes 
the request as if it was a local MSA-OUT event (see above), forms a CBM tag 
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class Node_ 


t is 


// mobile node object 


int 


locevid; 


// local event id 


int 


basemsgid; 


// last successfully acknowledged message 


bit 


msglist[ ] ; 


// list of received messages 


int 


lastmsg; 


// last received message 


void 


HRECV (Rmsg. 


-t msgin) 



var 

int baseadv, msgloss; 
begin 

locevid locevid + 1; 

baseadv msgin.baseid — basemsgid; 

if (baseadv > 0) then 

truncate(msglist, baseadv) ; 
basemsgid := msgin.baseid; 

end 

msgloss msgin.locid — lastmsg — 1; 
msglist msglist || ^0^ * msgloss || 'l'\ 
lastmsg msgin.locid; 

end 

Rmsg_t HSEND ( ) 

begin 

Rmsg.t msgout; 
locevid locevid + 1; 
msgout. locid := locevid; 
msgout. baseid := basemsgid; 
msgout. list := msglist; 
return msgout; 

end 

end 



Fig. 3. Mobile node behavior in the wireless segment logical time protocol 



and forwards the hand-off confirmation to the new MSA. The latter initializes 
the data structures and transmits the messages that had been earlier put on 
hold to their destinations. 



Establishing Causal Relation 

To find out the causal relation between two given mobile nodes events e and 
/, first check whether the mobile nodes at time of the event occurrence were 
registered with one and the same MSA. If this is the case, the local vector time, 
maintained by the MSA, completely characterizes causality. Otherwise, assume 
that e / is the hypothesis is to be tested, and event e occurs at a mobile node 
registered with MSA Aj, whereas event / occurs at a mobile node registered 
with MSA Aj. For the base message send event of event /, as well as all events 
referred to by its list of received but unacknowledged messages, find the union of 
their Max{Pastj{-),'A) sets, as determined by the respective CBM timestamps. 
Then query MSA Aj with respect that set and event e. To answer the query, 
MSA Aj finds the earliest MSA-OUT or MSA-RELAY event that is aware of 
event e, and determines whether this event belongs to the causal past of any of 
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class MSA_t is 



1 1 mobility service agent object 



int msgidvect[]; 


// 


int basemsgvect[ ]; 


// 


vtime.t ExtHist [ ]; 


// 


vtime_t IntHist [ ][ ]; 


// 



message id per mobile node 
last acknowledged message per node 
external send history 
internal send history 



Rmsg_t MSA-IN (int nodeid, bitvect Lpast[ ]) 
var 

Rmsg.t msg; 
vtime.t locvtime; 
begin 

locvtime := ^^^{eGLpast [ ] } ExtHist[e]; 
msgidvect [nodeid] := msgidvect [nodeid] + 1; 
msg.locid := msgidvect [nodeid]; 
msg.basemsgid := basemsgvect[nodeid]; 

IntHist [nodeid] [msgidvect [nodeid]] := locvtime; 
return msg; 

end 

void MSA-QUT (int nodeid, int xevid, Rmsg.t msg) 
var 

vtime_t locvtime; 
begin 

locvtime := max^.>0_ = 1} 

IntHist[nodeid][msg.baseid + i]; 
basemsgvect[nodeid] msg.baseid + length(msg.list); 
IntHist[nodeid] [basemsgvect[nodeid]] := locvtime; 
locvtime[nodeid] msg.locid; 

ExtHist [xevid] := locvtime; 

end 

end 



Fig. 4. Mobility agent behavior in the wireless segment logical time protocol 



the query events. If so, the hypothesis e / is confirmed, otherwise, it is known 
to be false. 



4 Conclusions 

In the present work, we have focused on the isomorphic tracking of causality in a 
large distributed system running over an IP network with the wireless mobile ac- 
cess. In such a network, mobile nodes communicate with each other and with the 
rest of the network via the mobility service agents. The network can be naturally 
partitioned into wireline and wireless segments with the complementary logical 
time protocols running on each segment. We have used the Compact Bit-Matrix 
clock mechanism in the wireline segment and have proposed a centralized mod- 
ification of the vector time protocol to manage the logical time in the wireless 
segment of the network. We have demonstrated how the two mechanisms can be 
integrated into an isomorphic logical time system. The integrated protocol can 
be readily used to handle hand-offs and requires a single query to test a causality 
hypothesis between any pair of mobile node events. 
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Abstract. The convergence of the mobile communications and Internet 
technologies has become a major driving force in the next generation 
telecommunications world. In order to realise the dream of global personal 
communications and allow the users to access information anywhere, at any 
time, the integration of terrestrial and satellite communications networks 
becomes necessary. Because the inter-segment handover is regarded as a key 
element to implement the seamless integration of terrestrial/satellite 
communications networks, it has been placed among top issues in the research 
of global personal communications. In this paper, by adopting mobile-IP, 
intelligent network and dual-mode mobile terminal technologies, the 
architecture of an IP-based integrated terrestrial/satellite mobile 
communications network is proposed on the infusion of mobile 
communications and Internet technologies. Based on this architecture, an inter- 
segment handover protocol is described, and its performance is analysed. 



1 Introduction 

In our information society, the last decade has been marked by the tremendous 
success of mobile communications and the Internet. Now, the convergence of the 
mobile communications and Internet technologies has become a major driving force 
in the next generation telecommunications world and leads to the birth of IP-based 
mobile communications networks, the essence of which is to allow the users to access 
information anywhere, at any time. It is generally agreed that the terrestrial and 
satellite communications networks should work together to realise global personal 
communications. Because the inter-segment handover is regarded as a key element to 
implement the seamless integration of terrestrial/satellite communications networks, it 
has been placed among top issues in the research of global personal communications. 
In this paper, by adopting mobile-IP, intelligent network and dual-mode mobile 
terminal technologies, the architecture of an IP-based integrated terrestrial/satellite 
mobile communications network is proposed on the infusion of mobile 
communications and Internet technologies. Based on this architecture, a functional 
model is built and an inter-segment handover protocol is described, and its 
performance is analysed using both a mixture of simulation and analytical modelling. 
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2 Network Architecture 

As shown in Figure 1, the components of proposed network architecture are designed 
to be consistent with the next generation mobile network architecture defined by the 
Third Generation Partnership Project (3GPP) [1] and ITU [2]. This architecture is 
divided into three subnetworks: mobile equipment, access network and core network 
domains. 

Mobile equipment domain - This consists of the mobile user and the mobile 
terminal. The mobile user is a mobile personal computer that can run a number of 
application programs, and provides real-time and non real-time services. 

The mobile terminal is considered to be a dual-mode terminal that can connect to 
both terrestrial and satellite access networks, and can perform a variety of 
functionalities. The mobile user and mobile terminal can be separated to allow the 
mobile user to connect to any mobile terminal. 

Access network domain - There are two types of access networks: the terrestrial 
access network, which consists of Base Transceiver Station (BTS) and Radio 
Network Controller (RNC), and the satellite access network includes satellites and 
Fixed Earth Stations (FES). 

Core network domain - The IP-based core network is a part of Internet and 
comprises the gateways, the intelligent networks and the home agent. 

The gateway is a router on the mobile node’s visited network and provides routing 
services to the mobile user while the mobile user attaches to it. It also provides the 
switch function and is fully controlled by the intelligent network. 




The intelligent network is the “intelligent part” of the mobile communications 
networks, and takes charge of call and connection control, mobility management, and 
service management. There are three intelligent networks in the core network: one is 
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the home intelligent network, where the mobile terminal registers, and the other two 
are visited intelligent networks, where the mobile terminal may visit. These two 
visited intelligent networks take charge of terrestrial and satellite mobile networks 
separately. When the mobile terminal roams to a visited network, the visited 
intelligent network will exchange relative information with the mobile terminal’s 
home intelligent network. 

The home agent is a router on a mobile user’s home network and maintains 
subscriber information and tunnels datagrams to the mobile user when it is away from 
its home network [3]. 



3 Functional Model 

Figure 2 shows the functional model of the proposed network architecture. 

The functional entities in this model are consistent with the definition of ITU [4], 
and their functionalities can be deduced from their names. This model can be divided 
into five parts: the MU, the MT, the RNC/FES, the intelligent networks and the fixed 
user. 

Both MU and Fixed User have only one functionality: User Identification 
Management Function (UIMF), which handles the identification and service related to 
the user. 

In the MT there are four associated functionalities: Call Control Agent Function 
(CCAF), Mobile Control Function (MCF), Radio Access Control Agent Function 
(RACAF), and Mobile Radio Transmission and Reception (MRTR). CCAF takes 
charges of call control, and MCF handles service control. RACAF and MRTR takes 
charge of radio access control related functions. 

In the RNC and FES, there are two associated functionalities: Radio Access 
Control Function (RACF) and Radio Frequency Transmission and Reception (RFTR). 
These two entities take charge of radio access control related functions. 



Visited Intelligent Network 
& Gateway 




Fig. 2. Functional Model 
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There are seven associated functionalities in the intelligent network: Service Data 
Function (SDF), Service Control Function (SCF), Authentication Management 
(AMF), Location Management (LMF), Service Access Control Function (SACF), Call 
Control Function (CCF) and Service Switching Function (SSF). SDF and AMF 
controls the storage and provide data-related services. SCF manages all the service 
control and mobility management in the mobile communications network. CCF 
controls the call/connection processing. SACF provides both call-related and call- 
unrelated processing and control. SSF maintains the interaction between SCF and 
CCF. LMF takes charges of terminal mobility and location management. 

The Generic Radio Access Network (GRAN) concept is used in this paper to divide 
the access network’s functionalities into radio-independent and radio-dependent parts, 
which allows different radio access modules to be connected to the same network 
infrastructure via a unique interface. 

Since the proposed inter-segment handover protocol in this paper are radio- 
independent, it can be used in both terrestrial and satellite systems. 



4 Inter-segment Handover Protocol 

Handover has three phases: initiation, decision and execution. The handover can be 
initiated and executed for various reasons and can be divided into different kinds 
according to handover control, connection establishment and connection transference 
schemes. 

The inter-segment handover strategy is critical in the design of IP-based terrestrial 
/satellite mobile communications networks. It not only enables the mobile terminal to 
change the radio link that connects the mobile terminal and the network to maintain a 
good QoS while the mobile terminal moves into a different network, but also allows 
the mobile user to obtain a new IP address and inform the home agent and 
corresponding user this change to receive the packets via the new link. 

In order to design a fast and efficient inter-segment handover protocol, a soft, 
mobile-controlled handover has been selected as the handover method according to 
the characteristics of the inter-segment handover. The reasons are as follows: 

1 . The costs of the satellite link and terrestrial link are different and the user should 
have the right to choose the suitable segment; 

2. The user can select a suitable segment according to the different geographical 
environments, which reduces unnecessary handover. 

3. The signalling exchanges between the MT and network can be reduced to a 
minimum. The mobile-controlled handover will reduce the signalling load and the 
signalling delay and improve the efficiency of the network. 

4. In the soft handover, the new link is established before the old link is released, 
and the data can be transmitted in both links simultaneously. To keep a good QoS, 
especially reducing the packet loss during the handover, the soft handover is also 
selected. 

Inter-segment handover can occur in either of two directions: the satellite-to- 
terrestrial handover and the terrestrial-to-satellite handover. Since the procedures of 
these two kinds of inter-segment handover are very similar, only the satellite-to- 
terrestrial inter-segment handover protocol is described in this paper. The procedure is 
as follows: 
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Fig. 3. Signalling Flow for the Satellite-to-Terrestrial Inter-Segment Handover Procedure 



During the handover initiation phase, when the mobile terminal detects a QoS 
change, it can access a terrestrial link and send a handover initiation request message 
to the network. When the FES receives this request, it forwards the message to the 
visited intelligent network and the latter relays the message to the home intelligent 
network. The home intelligent network will check the request and identifies the target 
terrestrial cell, and sends relative information to the visited intelligent network in the 
satellite access network. 

Then the visited intelligent network in the satellite access network sends a resource 
access request to its terrestrial counterpart. If the requested resource is available, the 
RNC in the terrestrial network is asked to reserve the radio resource. The intelligent 
network in the satellite access network sends a handover command, which includes 
the information about the new radio bearer, to the mobile terminal via the old 
signalling link. 

When the mobile terminal receives this message, it enters the handover execution 
phase. Firstly, it initiates the procedure of the new radio bearer setup, and a setup 
message is sent to the RNC via a BTS. Then the mobile terminal starts to establish a 
radio bearer in the new link to transport signalling and user data packets. After 
establishing the terrestrial radio bearer, the mobile terminal is connected to both the 
old and new bearers at the same time. By using mobile IP, the mobile user can obtain 
a new care-of-address, and then send messages to the home agent and the 
corresponding user to update their binding caches according to the optimal routing 
principle. Thereby, the new packets sent from the corresponding user can be delivered 
to the mobile user via the new link. After achieving this, the mobile terminal switches 
the connection from the old satellite link to the assigned terrestrial bearer. Finally, the 
mobile terminal releases the old satellite radio bearer. 
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5 Performance Analysis 



The performance of the proposed signalling protocols are evaluated in the terms of the 
QoS parameters such as: protocol execution delay, throughput, and handover failure 
probability. 

Delay is always a very important QoS parameter to evaluate the performance of 
signalling protocols. We will use the protocol execution delay to test the proposed 
inter-segment handover protocol. The protocol execution delay can be calculated as 
follows: 

Let 

Tj = overall transmission delay 
Tj = overall propagation delay 
T 3 = overall waiting delay 
T^ = overall processing delay 
If T = protocol execution delay, 

T = XTi- (1) 

i=i 



The mean transmission delay is the average time for a message being transmitted 
to be pushed onto the channel. It depends on message lengths and the bit rates on the 
transmission links. The overall transmission delay is the sum of the transmission 
delays for all the messages being transmitted among different nodes. 

The mean propagation delay is the average time for messages take to be 
propagated, and it depends on the distance between nodes. The overall propagation 
delay is the sum of the propagation delays for all the messages being transmitted 
among different nodes. 

The mean waiting delay is the average time a message waits (is queued) in the 
system. It depends on the traffic load, the number of users, the number of servers, and 
the sizes of buffers. The traffic is generated by two traffic generators: one is a 
signalling traffic generator, and the other is background traffic generator, both of 
which have an exponential inter-arrival time distribution. We have also assumed that 
the service time of packets follows an exponential distribution in the analytical model. 
This will be approximately true since packet length has been taken to be exponentially 
distributed for the background traffic, and the signalling traffic and background traffic 
have the same mean values, although the former has a generally distributed packet 
length. The analytical model used is based on an M/M/1 queue with infinite buffer 
and therefore the waiting time of a single message in the queue can be calculated 
according to the following formula [5]: 



waitting N 

i 



where 



The mean arrival rate from the i"" source 
• Mean service rate 
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The processing delay is the time for a message being processed in a node, and 
includes the delays resulted from the encapsulation, decapsulation, routing and all the 
operations related to the processed message. 

Since throughput is another important QoS parameter, it is also used to judge the 
performance of the proposed inter-segment handover protocol. 

The handover failure probability is used to test the inter- segment handover 
protocol under different traffic and pre-set handover limitations. To keep a satisfied 
QoS, a handover should be performed as fast as possible, which means the signalling 
messages exchanged between the mobile user and the network should be minimised. 
To evaluate the handover failure probability of the inter-segment handover, a time 
limit is pre-set and the delay is measured from the mobile user sending a first message 
to the network to initiate an inter-segment handover till the “HO_complete.Ind” 
message being received. If the delay beyond the pre-set handover limit, it can be 
regarded as a failed handover. 

The commercial software package OPNet is used to simulate the behaviour of the 
inter-segment handover protocol in an IP-based integrated terrestrial/satellite 
environment. 

The numerical assumptions made are given in the Table 1 and the satellite 
parameters have been chosen to be consistent with a Medium Earth Orbit (MEO) link. 



Table 1 Parameters Used in the Simulation 



Parameter 


Value 


Mean Service rate 


1 OOOOObit/second 


Radio link bit rate 


2000000bit/ second 


Signalling bit rate for transmitters and receivers 


200000bit/ second 


Processing delay for each message 


0.5ms 


Propagation delay between MT and FES 


70ms 


Propagation delay between MT and RNC 


1ms 


Propagation delay between VIN,„ and HIN 


1ms 


Propagation delay between VIN,„ and HIN 


1ms 


Propagation delay between VIN,„ and VIN,„ 


1ms 



The results show that the protocol execution delay, throughput, and handover 
failure probability are highly dependent on the traffic loads. 

Eigure 4 shows the protocol execution delay of the inter-segment handover. 
Because the propagation delay of the satellite link is much longer than that of 
terrestrial link, the number of messages exchanged via the satellite link dominates the 
overall propagation delay during handover and also has a great impact on the waiting 
delay. Furthermore, it influences the protocol execution delay. If the users perform a 
satellite-to-terrestrial handover, when the traffic load changes from 5 
handover/second to 20 handover/second, the delay increases from 344ms to 457ms. 
However, when the traffic load changes from 20 handover/second to 23 
handover/second, the delay increases from 457ms to 704ms. The results show that the 
satellite-to-terrestrial handover is much faster than terrestrial-to-satellite handover. 
For example, when the traffic load is 20 handover/second, the satellite-to-terrestrial 
handover is 457ms and terrestrial-to-satellite handover is 1140ms. It conforms with 
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the common opinion that terrestrial communications is always the first choice, and 
satellite communications serves as an alternative in certain situation, such as when the 
terrestrial network is not available. If possible, the user can quickly perform a 
satellite-to-terrestrial handover when in pursuit of a higher QoS and lower cost. 
Figure 4 also shows that the simulation results match reasonably well the analytical 
results predicted by a model incorporating equations (1) and (2), thus giving a degree 
of confidence in the simulation. 




Fig. 4. Effects of Load on the Inter-segment Handover Protocol Execution Delay 



Together, Figures 4 and 5 shows that the growth of traffic load leads to an increase 
in throughput, but at the cost of an increased handover delay. Since an infinite buffer 
is used in the queue, there is no packet loss. As the traffic load increases, the 
throughput increases nearly linearly. 

Figure 6 shows the effects of the network traffic and pre-set handover limit on the 
handover failure probability. Because the handover execution delay in the terrestrial- 
to-satellite inter-segment handover is much longer than that of the satellite-to- 
terrestrial inter-segment handover, a higher pre-set limit is used in the former. The 
handover time limits 400ms, 450ms and 500ms have been used for satellite-to- 
terrestrial inter-segment handover, and handover limits 1100ms, 1150ms and 1200ms 
for terrestrial-to-satellite inter-segment handover. Note that if satellite-to-terrestrial 
inter-segment handover is performed when the traffic load is 12.5 handover/second, 
the handover delivery failure probability is 5.6%, 2.4%, and 1.1% for handover limit 
400ms, 450ms and 500ms respectively. The heavier the traffic load, the higher 
handover delivery failure probability, and the higher the pre-set limit, the lower the 
handover delivery failure probability, so a compromise is required. 
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6 Conclusion 

In this paper, a new architecture for an IP-based integrated terrestrial/satellite mobile 
communications network is proposed, an inter-segment handover protocol is 
described. The performance of the proposed inter-segment handover protocol is also 
analysed and evaluated. The results show that both the protocol execution delay and 
throughput are highly dependent on the traffic load. The handover failure probability 
depends on a trade-off of the network traffic and pre-set handover limit. In this 
content, it is shown that a much higher limit has to be used in the terrestrial-to- 
satellite handover than for the satellite-to-terrestrial handover. This is due to the 
longer propagation delay on the satellite link and the larger number of messages that 
be exchanged via satellite link in the terrestrial-to- satellite inter- segment handover 
protocol. The proposed architecture and inter-segment handover protocol will provide 
a potential solution to seamlessly integrate the terrestrial and satellite networks in the 
future IP-based global mobile communications networks. 
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Abstract. In this paper, we study the effect of integrating voice and 
data services in wireless data networks. In particular, we evaluate the 
performance on cdma2000 system which is currently one of the strongest 
contenders for wireless data networking technology. Since the true nature 
of the wireless data is still not known, we use a mix of Poisson-distributed 
voice packets and Pareto-distributed data packets. The ratio of the mix is 
varied and the performance is studied with respect to channel utilization, 
waiting time, and blocking probability. The effect of the shape parameter 
of the Pareto-distributed data packets on the system performance is also 
studied. 

Keywords: wireless voice/data integration, Pareto distribution, cdma2000 
networks, resource allocation. 



1 Introduction 

All the emerging wireless data networking technologies today rely on the Inter- 
net Protocol (IP) because IP is still the most dominant internetworking proto- 
col. Moreover, the already existing Internet infrastructure should be exploited 
as much as possible to defray the cost of overlaying wireless technology. The 
advances in IP coupled with the provisioning for quality of service (QoS) for 
multimedia applications makes IP a good choice for cellular providers to deliver 
the service to the already existing huge customer base. 

The integration of voice services from the cellular domain and data services 
from the IP domain, is still a major challenge because of the differences in the 
characteristics and requirements of both the services. Voice services are delay- 
sensitive and aim to provide equal service to all users regardless of their location 
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in the cell within the cellular architecture. Moreover, voice calls last for a few 
minutes with silence between talk spurts most of the time. A better channel 
utilization can thus be obtained by sharing the channel resources with other 
voice terminals. These features result in allocation of power depending on the 
individual user’s voice activity. Voice services until now have been optimized for 
only voice telephony which requires continuous bit-stream type service with no 
delays. Additionally, a relatively modest data rate is sufficient for high-quality 
voice service, and voice users cannot substantially benefit from higher data rates. 

In contrast, data services work well with discontinuous packetized transmis- 
sion and can tolerate more delay. Packet data systems are aimed at maximizing 
the throughput. Given various data rate requirements for different data users, 
the goal is no longer to serve everyone with equal power and equal grade of 
service. Rather, the goal is to allocate users the maximum data rate that each 
can accept based on application needs and wireless channel conditions. 

To provide multimedia services through wireless channels, it is also important 
that certain quality of service (QoS) parameters as specified by the applications 
are satisfied such that a balance between the number of users served and their 
degree of satisfaction is obtained P|. Certain schemes like efficient admission 
control, optimal resource management and good error control are adopted to 
achieve this. The requirements for future multimedia services include higher 
capacities, increased spectral efficiency, higher speeds and differentiated services. 

cdma2000, which is an evolution from IS-95 standard, is currently one of 
the strongest contenders for wireless data networking ECU. It provides next- 
generation capacity while maintaining backward compatibility In this pa- 
per, we investigate the performance of cdma2000 to support both voice and data 
services. Five classes of services are considered depending on their bit rate re- 
quirements. For modeling voice services, we use Poisson distribution and hence 
the inter-arrival time for these packets is exponentially distributed. For data 
services, the session duration as well as the inter-arrival time for packets are 
modeled using Pareto distribution m- Different shape parameters are consid- 
ered to account for the variability in the burstiness of the data packets. The ratio 
of the load due to voice services and data services is varied for the entire possi- 
ble range, for example, a mix from 0% voice and 100% data to 100% voice and 
0% data. The metrics we considered for the evaluation are channel utilization, 
waiting time and blocking probability. 

The rest of the paper is organized as follows. In Section 2, we discuss the 
nature of wireless data. Section 3 presents our approach for modeling both voice 
and data services. Section 4 describes our simulation model while Section 5 
discusses the experimental results. Conclusions are drawn in the last section. 

2 Nature of Wireless Data 

Modeling network traffic using a Poisson or Markovian arrival process is com- 
mon because of its theoretical simplicity. It also has some favorable properties 
like the smoothening of the total traffic by statistical aggregation of multiple 
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Markovian arrival processes, which are individually bursty in nature. Careful 
statistical analysis of data collected from experiments on Ethernet LAN traffic 
0 for long durations has shown that such data actually exhibit properties of 
self- similarity n and that there is long-range dependencies among the data. 
It is observed that such a traffic is bursty over a wide range of time scales which 
can usually be generated by heavy-tailed distributions with infinite variance 0 . 
Pareto distribution is such a distribution with heavy tail and large burstiness. 
This self-similar nature of Ethernet traffic is different both from the conventional 
traffic models and also from the currently considered formal methods of packet 
traffic ^ . However, there is still considerable debate over the actual modeling 
of network traffic because it has serious implications on the design and analysis 
of networks. Analysis using self-similar models generally ignores the time-scale 
in which the experiments are performed. The finite range of the time periods of 
our observations makes it necessary to study and model network traffic as not 
strictly self-similar. The amount of correlation that we should consider should 
not only depend on the correlation nature of the source traffic but also on the 
time scale which is specific to the system under consideration. 

Just considering either of the two distributions (Poisson and Pareto) will 
not truly represent the nature of wireless data and therefore characterization 
becomes difficult. In fact, till date there exists no unified model which represents 
the true nature of wireless data. Services like file transfer, e-mail and store- 
and-forward facsimile - usually known as short-messages services (SMS) - are 
relatively short and can be represented by a Poisson model m- Interactive data 
services can be modeled as a queue of packets at each source with a random 
arrival process into the queue. The expected session length of these services is 
1-2 minutes, which is really a short interval to see the effect of any long range 
dependencies. It might even be difficult to find any kind of correlation among 
the data pattern within that interval of time. 



3 Our Voice/Data Model 

It is our belief that wireless traffic would not strictly follow Poisson or Pareto 
distributions, but will have components from both. Now the question arises about 
the percentage contribution of voice and data. It has been recently observed 
that there is a steady increase in the number of data users. What might be 
appropriate is to evaluate a system where all possible combinations of voice and 
data components for a given load are considered. Let us now consider the two 
models. 



3.1 Voice Model 

We assume that active users produce and transmit voice packets at a certain 
rate and inactive users do not transmit at all. A voice call shows periods of 
activity and inactivity. We model the duration of both talk spurts (activity) and 
gaps (inactivity) as exponentially distributed. If the mean duration for the talk 
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spurts is Tt and the mean duration for the gaps is then activity, A, is defined 
as 

activity = A = . 

If the length of the talk spurts are given by the random variable T, then 

T = —Tt X ln{l — U) 

where U is uniformly distributed between 0 and 1. We also assume that the 
duration of a voice session is exponentially distributed with a certain mean. 



3.2 Data Model 

Current Internet traffic has been shown to be self-similar, which means that such 
traffic is bursty over a wide range of time scales. As Ethernet traffic was shown 
to be different from conventional traffic models, we use a heavy-tailed Pareto 
distribution for modeling data traffic. The active data spurts are assumed to 
be independent and identically distributed according to the Pareto distribution 
with shape parameter a and scale parameter k. The cumulative distribution 
function F{t) for Pareto distribution is given by 

E(t) = l-(f)“. 

The burstiness of the data packets can be controlled by the changing shape 
parameter a. It is observed in ^ that for all practical purposes 1 < a < 2. If 
the length of the data spurts are given by the random variable T>, then T> can 
be obtained as 



V = kx 

Oi 

It is also assumed that the duration of a data session is Pareto distributed with 
a certain mean. 



3.3 Multiplexing Voice and Data 

As mentioned earlier, wireless data will have packets from the IP domain as well 
as the telephony domain. Therefore, the base station would receive both voice 
and data multiplexed. For any resource allocation scheme, the base station has 
to consider the multiplexed stream. If we define voice fraction K as 

voice load 

voice load + data load 

then (I — K) is the data fraction of the load. The two extreme cases are K = 0 
implying no voice component and K — 1 implying no data component. Note 
that at any point of time, the total load in the system remains the same. 
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3.4 Performance Metric 

We do not consider any prioritization between voice and data users in our service 
admission scheme. This may not be true in practice. Different operators may 
assign priority between voice and data users based on the potential revenue 
provided by either type of subscriber. For admitting a service into the system, 
we compare the current load to the total channel capacity C fixed at 100. A 
service is blocked or denied admission into the system if the capacity with the 
inclusion of the new service exceeds C. The percentage of the blocked services 
gives the blocking probability. Once a service is admitted, it generates bursts 
depending on the nature of the service. For every burst, the scheduler at the base 
station tries to allocate the required number of traffic channels. Every burst has 
to contend for the required number of traffic channels before it could go into the 
transmission phase. The burst is not allocated any traffic channel unless and until 
the required number of traffic channels are available. This contention process 
incurs some delay. We define waiting time (which is a measure of the delay) as 
the average of the delays incurred by the bursts. A service which acquires some 
traffic channels for the transmission of a burst, relinquishes those channels soon 
after the successful transmission of the burst. We also define channel utilization 
as the fraction of the time the traffic channels actually transmit packets. 

4 Simulation Model 

For the purpose of studying the effectiveness of simultaneously supporting voice 
and data traffic, we choose to use the cdma2000 system nni. cdma2000 includes 
sophisticated medium access control (MAC) features which can concurrently 
support multiple data and voice services, thus effectively supporting very high 
data rates (upto 2 Mbps). cdma2000 extends support for multiple simultane- 
ous services much more than the services provided by IS-95-B. It does so by 
providing much higher data rates and a sophisticated multimedia QoS capabil- 
ity to support multiple voice/packet, circuit/data connections with differing QoS 
requirements. The design of cdma2000 allows for deployment of the 3G enhance- 
ments while maintaining the current 2G support for IS-95 in the spectrum that 
an operator has today. It is also compatible with the IMT-2000 spectrum bands 
0, so operators acquiring new spectrum will be able to experience the benefits 
of cdma2000 as well. 

In our simulation experiments, we considered 5 classes of traffic depending on 
their data rates. The data rates considered are in the ratio of 1:3:6:9:12. This is 
because, cdma2000 has a multi-carrier feature in which a service can be allocated 
more than 1 (3, 6, 9 or 12) traffic channels for data transmission. Requests are 
made for both types of service connection establishment at a certain arrival 
rate A. Request for connection establishment by all the classes are assumed to 
be equally probable. If a service is admitted into the system, then the service 
specifies its requirement profile in terms of its type (voice/data) and the bit 
rate requirement. The generation of bursts (a collection of packets) for a voice 
service is Poisson distributed and that of data service is Pareto distributed. 
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Fig. 1. Average Waiting Time 



The session lengths for voice and data services are also Poisson and Pareto 
distributed, respectively. We restrict the duration of a data service to 10 times 
the average duration. The other parameters used for simulation are shown in 
Table CJ 



Table 1. Parameters used for simulation 



No. of Traffic Channels (C) 


100 


Service Arrival Rate (A) 


0.005/frame 


Activity {A) 


0.5 


Mean Voice Session 


2 mins 


Mean Data Session 


2 mins 


Maximum Data Session 


20 mins 


Frame Duration 


20 ms 


Mean Burst Length 


30 frames 



Static allocation of code channels to a small number of users leads to an 
inefficient use of the CDMA air interface capacity. Dynamic allocation of the code 
channels to the arriving bursts makes it possible to share the channels among 
large number of users without hampering their QoS requirements. The allocation 
of the codes is done at the beginning of each frame. This is in contradiction to 
the present cdma2000 system which cannot continuously reallocate codes at the 
beginning of each frame. However, the enhancements proposed to IS-2000 called 
IXTREME has this feature implemented and we assume that out current 
system can dynamically reallocate codes at the beginning of each frame. The 
number of traffic channels that can carry data simultaneously is limited by the 
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Fig. 2. Channel Utilization 




Fig. 3. Blocking Probability 



number of traffic channels available and not due to noise or multiple access 
interference (MAI) |C]. This can be justified by the fact that the interference 
amongst the users are proportional to the additive signal strength, so the noise 
interference can be approximated as a linear function of the number of traffic 
channels. 

5 Experimental Results 

Figure D shows how the waiting time varies as the voice fraction (K) changes 
from a minimum of 0 to a maximum of 1. It is seen that for lesser values of 
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Fig. 4. Average Waiting Time 




Fig. 5. Channel Utilization 



the shape parameter a, the behavior is erratic, i.e, the average waiting time 
is non-monotonic with the voice fraction. The channel utilization (Figure EJ 
almost remains the same except for a = 1.1. The blocking probability (Figure EJ 
increases with the increase in voice component. It can be noted that at AT = 1, 
all plots corresponding to different a, converge. This is because, there is no data 
component and the voice component offers the same load. 

We also study the effect of the shape parameter a on the waiting time, 
channel utilization and blocking probability. The range considered is 1 < a < 2. 
The voice fraction K was maintained at 0.2, 0.4, 0.6 and 0.8. A non-monotonic 
nature in the waiting time is observed from Figure 0 for a < 1.6. The channel 
utilization (Figure EJ is almost the same for the different values of K. The 
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Fig. 6. Blocking Probability 



blocking probability increases monotonically with the increase in a. If we have 
to provide strict delay guarantees, then we will have to admit fewer number 
of services into the system, thereby blocking most of them. But the channel 
utilization in that case will decrease. We can find the desirable operating point 
so that the system is able to perform within certain bounds. From these results 
we can decide on the fraction of voice and data which would give the desired level 
of performance. In other words, we can determine if a service is to be admitted 
into the system if all the QoS requirements of all the on-going services are to be 
satisfied. 

6 Conclusion 

As cdma2000 grows to be one of the most important data networking tech- 
nologies, it is important to see how effective it will be in terms of supporting 
integrated voice and data services. Since the true nature of the wireless data is 
still not known, we used a mix of Poisson-distributed voice packets and Pareto- 
distributed data packets. We conducted simulation experiments and generated 
services of different types with different requirements. The proportion of the 
load due to voice and data services were varied for the entire spectrum and 
the performance of the system with respect to channel utilization, waiting time 
and blocking probability were studied. The effect of the shape parameter for 
the Pareto-distributed data packets were also studied. These results give insight 
into the performance as would be expected from a system dealing with both 
voice and data. Also, for a system to perform with certain QoS guarantees, the 
operating point for the system can be determined. 
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Abstract. This paper proposes a hard handover control scheme that supports 
Internet Protocol (IP) host mobility by means of IP address translation servers 
that translate an IP address into another, and a handover server that registers 
locations of IP mobile hosts and remotely controls the servers. A functional 
comparison of the proposed scheme and conventional ones such as Mobile 
IPv4, Mobile IPv6, and Cellular IP is also presented in this paper. 



1 Introduction 

The growing use of portable computers is increasing the demand for using the Internet 
in mobile situations. Over the Internet, packets are transferred from source Internet 
Protocol (IP) hosts to destination IP hosts. Each packet has a source IP address and a 
destination IP address. However, because the IP addresses have the dual role of 
serving as an identifier that specifies the IP host and serving as an identifier that 
specifies the location of the IP host, the IP address must be changed as the IP host 
moves from place to place. Since communication generally cannot continue if the IP 
address changes, use of the Internet in mobile situations has not been possible. 
Mobile IPv4[l-4], Mobile IPv6[5], and Cellular IP[6] have been proposed to solve 
this problem. The Mobile IPv4 scheme has the shortcomings described in (1) through 

(3) below: 

(1) There is a problem of triangular routing resulting in path redundancy. 

(2) When the Home Agent (HA) to which a mobile IP host is usually connected is 
located at a distance, updating of the host’s registration takes time. For example, 
if a mobile IP host whose HA is in France is taken to Japan, it could take a long 
time to update the registration, and timeout or loss of user packets could 
frequently occur. 

(3) Because inner IP packets are encapsulated in outer IP packets, the outer IP 
header size is increased by the size of inner IP header. 

The Mobile IPv6 scheme also has the drawbacks described by (4) through (6) 
below: 

(4) For the initial packet transfer, it also has a path-redundant triangular routing 
problem. 

(5) It also has the long-distance registration problem. 
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(6) A home address, which is an IP address assigned in the HA to a mobile IP host, 
and other information are placed in extended header, which increases the size of 
the header. 

The Cellular IP scheme also has the shortcomings described in (7) through (9) 
below: 

(7) The routing entity has to be modified. 

(8) Cases in which it is necessary to change paths in a part of the network that 
comprises switching entities cannot be handled. 

(9) Dummy packets must be sent in order to maintain the routing cache mapping, 
even when there are no user data packets to be sent, thus wasting bandwidth in 
both the wired and wireless links. 

In order to solve these problems of the Mobile IPv4, Mobile IPv6 and Cellular IP 
schemes, this paper proposes a hard handover control scheme that supports IP host 
mobility. In this scheme, IP address translation servers controlled by a handover 
server are used to translate one IP address into another, which enables rerouting due 
to handover. 



2 Proposed IP Packet Transfer Control Scheme 

2.1 Network Configuration 

An example configuration of the mobile network that we propose for supporting IP 
host mobility is illustrated in Fig. 1. A mobile terminal (MT), which is an IP host 
moving in cellular service areas, communicates with a communication server, which 
we refer to as the Correspondent Entity (CE), via a wired backbone network. The 
wired network consists of IP nodes (IPNl through IPN4), an IP address translation 
server (ATS4), base stations 1, 2 and 3, which are equipped with IP nodes (IPN5 
through IPN7) and IP address translation servers (ATS5 through ATS7), and a 
handover server (HS). The ATSs are incorporated in each mobile-network-edge- 
entity connected to IP terminals, IP servers, or other IP networks. Every ATS has an 
IP address translation table designated by the HS. Based on the table, the ATS 
rewrites the destination and source addresses of IP packets it receives and forwards 
the packets to their new destinations. The HS manages the locations of the MTs 
based on the location information sent from them and remotely controls the ATSs by 
using regularly updated network status information. In the proposed scheme, an IPN 
can be any node that is capable of transferring IP packets, including conventional IP 
routers, IP switches, and ATM switches supporting IP-over-ATM protocols. The 
MTs and CEs require no special or additional IP transfer functions except those 
already in the IP modules. Furthermore, modification of the routing entities (i.e., 
rewriting of the routing table at the time of handover of the MTs) is not needed, as it 
is with the Cellular IP scheme. 
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CE 
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Fig. 1. Network configuration of a mobile network supporting IP host mobility. 



2.2 IP Packet Flow 

The downlink IP packet flow before a handover is shown in Fig. 2. The MT is 
located in the radio zone of base sfafion 1, and IP packefs senf from the CE to the MT 
are transferred via base sfafion 1 . The IP packets have IP address information “Dst = 
M, Src = C” in their headers. When they are received at ATS4, the information is 
replaced with “Dst = X5.a, Src = X4.b” and the packets are forwarded to ATS5. 
When they are received at ATS5, the information is replaced with “Dst = M, Src = C” 
and the packets are forwarded to the MT. In this way, the IP packets are transferred 
from fhe CE to the MT via base station 1 . 

The downlink IP packet flow after the handover is illustrated in Pig. 3. The MT is 
now located in the wireless zone of base station 2. Thus, IP packets sent from the CE 
to the MT have to be transferred via base station 2. The HS sends out commands to 
the ATSs and remotely controls the IP address translation so as to make such 
rerouting possible. When the packets having IP address information “Dst = M, Src = 
C” in their headers are received at ATS4, the information is replaced with “Dst = 
X6.a, Src = X4.b” and the packets are transmitted to ATS6. When the packets are 
received at ATS6, the information is rewritten into “Dst = M, Src = C” and the 
packets are forwarded to the MT. In this way, the IP packets are transferred from the 
CE to the MT via base station 2. 
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Fig. 2. Downlink IP packet flow before handover. 
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Fig. 3. Downlink IP packet flow after handover. 
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Since the proposed scheme does not involve a HA, the triangular routing problem 
that arises with the Mobile IPv4 and IPv6 schemes does not occur. Furthermore, 
since the proposed scheme involves only rewriting of the IP address in the IP header, 
there is no increase in header size as there is with the Mobile IPv4 and IPv6 schemes. 
Further still, there is no rewriting of the routing tables of the routing entities as there 
is with the Cellular IP scheme, thus the rerouting that accompanies handover is easy, 
even when the IPN is an IP switch. 



2.3 Control Performed by the Handover Server 

The HS registers the current location of the MTs and remotely controls the IP address 
translation of the ATSs. The location registration of an MT at the HS is illustrated in 
Fig. 4. When base station 2 detects an MT in the layer below the IP layer (e.g., 
wireless control), ATS6 sends a control packet to the HS to register the MT’s 
location. The HS then updates the registered location of the MT based on the 
information in the control packet. 

Figure 5 shows how IP address translation update commands are sent to the ATSs 
from the HS. After receiving the packets to register MT’s location from ATS6, the 
HS computes new appropriate transit entities between the CE and MT (via base 
station 2), based on the network status information it maintains, and sends out the 
transit information in the form of a list of ATSs. The HS also creates an IP address 
translation update command that specifies how the IP address translation should be 
performed at each ATS that is now the transit entity, and then it sends the command 
directly to the ATSs. When the transit entities receive those packets, they update their 
own IP address translation table. In the example shown in Fig. 5, ATS4 has 
possessed a translation table between IP address M<-^X5.a. Based on an update 
command from the HS to ATS4, it updates its table so as to translate between 
M<-^X6.a. ATSS also has possessed a translation table between X5.a< — >M and 
between X4.b<-^C. According to an update command from the HS to ATSS, it 
deletes the contents of the translation table. ATS6 initially has possessed no entries in 
a translation table. Based on an update command from the HS to ATS6, it adds 
instructions to translate between X6.a<-^M and between X4.b< — >C. By remotely 
controlling the ATSs in this way, the HS controls hard handover for the MTs. 

In the proposed scheme, when a base station detects an MT in the layer below the 
IP layer (e.g., wireless control), it registers the MT’s location with the HS. The HS 
then initiates handover control, and performs the hard-state control, which is executed 
by issuing explicit control commands, of IP address translation. In the Cellular IP 
scheme, the mapping of the routing cache is automatically deleted if IP packets do not 
arrive within a certain period of time. This kind of control is referred to as soft-state 
control which is executed implicitly; i.e., explicit control commands are not used. 
The soft-state control used in the Cellular IP scheme requires that dummy packets are 
sent continuously to maintain the old mapping of the routing cache. Meanwhile, since 
the proposed scheme does not use such soft-state control, no dummy packets have to 
be sent. 
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Fig. 4. Location Registration at the HS. 




Moving 

Fig. 5. Commands to update IP address translation tables. 
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3 Conclusion 

This paper has proposed a hard handover control scheme that supports IP host 
mobility. This scheme introduces IP address translation servers and a handover 
server. The translation servers translate an IP address into another. The handover 
server registers the location of IP mobile hosts, and remotely controls the translation 
servers. This makes it possible to transfer IP packets to a new destination after 
handover. This paper also has shown how the proposed scheme solves the problems 
associated with the Mobile IPv4, Mobile IPv6 and Cellular IP schemes. 
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Abstract. Due to the rapid progress of information technology, computer 
systems with the client/server architecture have been becoming a new way in 
multi-user computing environments. For the environment of single server, the 
issue of remote login authentication has already been solved by a variety of 
schemes, but it has not been efficiently solved for multi-server Internet 
environments yet. In this paper, we will present an efficient smart card based 
remote login authentication scheme for multi-server Internet environments, 
which can verify a single password for logining multiple authorized servers 
without using any password verification table. The objective of the new scheme 
emphasizes that any client can get service grant from multiple servers without 
repetitive registration to each server. The proposed scheme's advantages include 
that not only repetitive registration for various servers is avoided, but also the 
network users can freely choose their preferred passwords and be deleted easily 
by the system. Moreover, security analyses about the impersonation and replay 
attacks on the proposed scheme validate the feasibility of the scheme. 



Keywords: Communications security, Internet, Password authentication. Smart 
card, Lagrange interpolating polynomial. 



1 Introduction 

With the distributed nature of computer and network systems, the achievement of 
privacy and security has become increasingly important. In the past decade, various 
kinds of authentication mechanisms have been developed for protecting information 
or resources from unauthorized users [1-4]. Among them, password authentication is 
the most acceptable and widely used mechanism because of its inexpensive cost, ease 
of use, and simple implementation [3, 5, 6]. 

In a traditional password authentication scheme, each user is equipped with an 
identity number ID and a secret password PW. A valid ID and its corresponding 
password PW are required whenever a user wishes to enter the network system. The 
most straightforward authentication approach is to construct, in advance, a directory 
which stores each user's ID and the corresponding PW. Each network user, say t/., 
submits his/her identity U_ID- and password U_PW. during the login phase when 
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attempting to enter the network system. The network system then searches through the 
password directory table to see if the submitted password agrees with that which is 
prestored in the table. If the result is yes, then U. is recognized as an authorized user 
and is permitted to enter the network; otherwise, the login request is rejected. 
Apparently, from the viewpoint of memory protection, such an approach is too weak 
to ensure the security of the plain password table stored in the system. Also, it may 
introduce an additional burden on the system for managing the password table. 
Instead of directly storing the plain password table, alternative approaches [7, 8] can 
be applied as follows: 

(1) Encrypt users' passwords into test patterns through one-way functions, and 
then store these test patterns as a public verification table. When logging, the 
submitted password is first encrypted as a test pattern and transmitted to the 
system. The system then verifies the received test pattern according to the 
corresponding entry in the verification table. This approach can protect users' 
passwords from being disclosed during login transmission. 

(2) Store the password in ciphertext form such that an intruder cannot easily derive 
the secret password even if the intruder knows the content of the verification 
table or can recognize the change in the verification table. 

Nevertheless, the following shortcomings still exist in these enhanced password 
authentication schemes: 

(1) An intruder may encrypt his/her password by the one-way function, and then 
append it to the verification table. After this is done, the intruder can penetrate 
into the network with the forged password planted in the verification table. 

(2) If the content of the verification table is modified or destroyed by a malicious 
intruder, then the entire system may break down. In addition, the verification 
table needs extra memory space reserved in the system. 

(3) The verification table will be greatly expanded after many users have joined 
the network. The management of the password table then becomes increasingly 
complex. 

Considering remote access systems, a potential impersonation problem exists. 
An intruder may intercept a valid login request (or an authentication message) and 
replay it later to pretend to be a legal user. In particular, this issue often occurs in a 
remote access system through insecure channels. In 1981, Lamport [9] proposed a 
remote password authentication scheme which can withstand such a replay attack, but 
the scheme is insecure if the encrypted passwords stored in the computer system is 
modified by intruders. Later, another scheme based on the signature scheme of a 
public key cryptosystem was proposed by Denning [2], but the scheme still need 
password tables. In 1991, Chang and Wu [10] used the Chinese remainder theorem to 
implement password authentication in a remote access network environment. In their 
method, once a user wishes to login to the network, he/she must submit his/her 
identity and password associated with the smart card. A smart card originates from an 
IC memory card used in industry about ten years ago [11, 12]. The main 
characteristics of a smart card are a small size and low power consumption. Generally, 
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a smart card contains a microprocessor which can quickly manipulate logical and 
mathematical operations, a RAM which is used as a data or instruction buffer, and a 
ROM which stores the user's secret key and the necessary public parameters and 
algorithmic descriptions of executing programs [5, 10, 13]. The merits of a smart card 
for password authentication are its high simplicity and efficiency of the login and 
authentication processes. However, in Chang and Wu's scheme users' passwords are 
generated by the password generation center, so it is inconvenient for a user to choose 
the password at his/her own wish. One year later, Chang and Laih [14] broke Chang 
and Wu's scheme. They assumed that the information stored on the smart card could 
be easily read out by a smart card holder. Using the public information of a smart 
card, the holder derives the secret decryption keys of the computer system. Therefore, 
the holder can find another user's password by intercepting the login transmission 
message. Recently, Chang and Hwang [15] proposed another remote password 
scheme without authentication tables to solve the remote password authentication 
problem. In their scheme, users are not allowed to freely select their own identities 
and passwords, and the system cannot easily delete a legal user, either. There is a 
common feature for the above schemes without authentication tables. The feature is 
that the system cannot delete users easily. If a password held by a user becomes 
unauthorized, then the system has to reconstruct all passwords for the users. Though 
the system does not keep any secret data, it also loses the autonomy for managing its 
users. This is not practical in our real world. Additionally, Wang and Chang's [16] and 
Wu's [13] schemes permitted users to freely choose the preferred passwords, but it 
cannot easily delete a legal user from the computer system, and cannot be efficiently 
adopted in multi-server Internet environments, either. 

Based on smart cards, we propose a new remote login authentication scheme for 
multi-server Internet environments in this paper, which can verify a single password 
for logging multiple authorized servers without using any password verification table. 
For the environment of single server, the issue of remote login authentication has 
already been solved by a variety of schemes, but it has not been efficiently solved for 
multi-server Internet environments yet, where multi-server Internet environments may 
include FTP servers, file servers, database servers, or WWW servers. The objective of 
the new scheme emphasizes that any client can get service grant from multiple servers 
without repetitive registration to each server. The proposed scheme's advantages 
include that not only repetitive registration for various servers is avoided, but also the 
network users can freely choose their preferred passwords and be deleted easily by the 
system. We can show that in our scheme intruders cannot derive any secret 
information from the intercepted login transmission message. Moreover, security 
analyses about the impersonation and replay attacks on the proposed scheme also 
validate the feasibility of the scheme. 

The rest of this paper is organized as follows. In Section 2, we depict the 
proposed smart card based remote login authentication scheme for multi-server 
Internet environments. Section 3 shows that the new proposed scheme can not only 
withstand both the impersonation and replay attacks, but also prevent intruders from 
obtaining any secret information by intercepting the login transmission message. 
Finally, some concluding remarks are included in the last section. 
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2 The Proposed Scheme 

The proposed scheme can be divided into the following four stages: the system setup 
stage, the user registration stage, the login stage, and the server authentication stage. 
The central authority (CA) selects several public and secret system parameters in the 
system setup stage. In the user registration stage, the user chooses a password known 
only to himself/herself, and applies for the access privilege for each server. CA then 
delivers a smart card to the registered user. The smart card contains the user's identity 
and some necessary public and secret parameters used in the login and server 
authentication stages. When logging, the user first inserts his/her smart card into a 
terminal and keys in his/her password. The smart card then generates an 
authentication message and transmits it to each server. After receiving the user's login 
request (i.e. the authentication message sent from the smart card), each server can 
easily validate the login request by employing a verification equation derived from the 
authentication message. 

[The system setup stage] 

Initially, as in the RSA scheme [17], CA needs to select public keys N and e and 
secret keys /?,, p^, d, and •(N) as follows: 

(1) Select two distinct large primes p^ and p^, and calculate 
N = p* Pi and 



d>(N) = ip,-i}*(p,-ih 

where ‘is called the Euler totient function [2, 17]. 

(2) Select a public key e to be an integer with gcd(e, d>(N)) = 1. 

(3) Find a secret key d, the multiplicative inverse of e, such that 

e * d = 1 (mod ’(N)). 

[The user registration stage ] 



Assume that a new user registers to the multi-server system including m servers 
S,, Si, ..., and S^. Before U. gets service grant from multiple servers, he/she must 
submit his/her application to CA in advance as below: 

(1) First choose his/her own identity and password t/_PVk, and present 

them to CA. 

(2) After verifying the qualification of U., CA issues the identity U_ID- and the 
password U_PW^ to him/her. 

(3) CA computes t//s secret key 



where g is a fixed primitive element in the Galois field GF(A0 [18]. 

(4) Suppose that U- can get service grant only from servers S^, Si, ..., and S^ ,1 r 
m, and the service period of these servers for U- are £’_T„ E_Ti, ..., and E_T^^, 
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respectively. Moreover, it is assumed that the service periods of other servers 
^r+ 2 ’ '^he system for U. are all zero. CA then constructs a 

Lagrange interpolating polynomial^ (X) for U. in the following: 






+ 






(mod N ) 



= a„X" + a„_iX"-'+... + aiX + ao (mod N) 



where S_SK. represents the secret key of jth server in the system. 

(5) CA stores the interpolating polynomial f-(X) in the secret data space of t//s 
smart card U_SC. to prevent it from being disclosed or modified. In addition, 
CA stores U_ID^ and a one-way function h(X, Y) in the protected data space of 
U_SC^. Note that in our scheme, the secret data space of the smart card is for 
storing secret information, which is not allowed to read directly from the card 
and is used only for internal computations of the smart card. However, the data 
stored in the protected data space can be obtained by using the secret key. 

(6) CA delivers the smart card U_SC^ to U. through a secure channel. 

[The login stage] 

When the registered user U. logins the server S/ 1 • j'm), the following steps need to 
be performed by his/her smart card U_SC. : 

(1) U. first inserts his/her own smart card U_SC. to a card reader and keys in 
his/her password U_PW, Afterwards, U_SC- gets a timing sequence t from the 
system, where t is used as a timestamp for the login request. 

(2) U_SCi generates a secret random number and computes the two values 
and Cj in the following: 

Ci=g"*’^' (mod N) 



(3) Assume that S_ID. is the identity of jth server, which can be derived from the 
following formula 

5_/D. (mod N) 

P^{s_IDy"' (mod (mod (mod N) 

U_SC- then computes the value P as follows: 

(4) Given 1, 2, ..., m, andP, LL^Ccalculates^/if/j, andf]P). 

(5) constructs an authentication message M = {U_ID^, t, Cj, C^, f]l ), f]2), 
f(m), f]P)j and transmits it to the server 
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[The server authentication stage] 

Let represent the server 51s current date and time. After receiving the 
authentication message M sent from the login user t/., performs the following steps 
to authenticate t//s login request: 

(1) Validate the format of C//s identity U_ID.. If it is invalid, then reject the login 
request. 

(2) Check whether is too much later than the login timestamp t or not by 
examining if t^^^ - t >’T, where *T is the endurable transmission delay between 
the login terminal and the server 5 . If t^^^^ - t >’T, then the authentication 
message M may be replayed by some malicious attacker, and thus the server 
authentication stage has to be terminated. 

(3) Find the value P by employing the value C, and the secret key S_SK.: 

P = (mod Ai)= (g'’*' (mod Ai)= (mod N) 

(4) By using these m + 1 points (1, (1)), (2, f (2)), ..., (m,f. (m)), and (Pf^ (P)), 

reconstruct the original interpolating polynomial 

f[X) = ajr + a^jc' + ...+ a, X+a„ (mod N). 

(5) Find t/ls secret key U_Ri by computing the following formula 

(U_ID) = U_R_ (mod N). 

(6) Verify if the following equation holds by using C, , C^, and : 

( 1 ) 

^ ^ = 1 (mod N) 

*{u _R.y 



If the correspondence holds, then the login request is valid and S. accepts the 
login user t/,; otherwise, it rejects t/,.. The following will demonstrate why the 
verification procedure described in Eq. (1) works correctly: 



c 



e 

2 



^h(Ci,t) jj; 



P_Rf 



(mod A^) 



* gh*h(C„t)^ 

^ ^^u_PW,*d Y 



1 


^gn*HC„t) ^e*U 


_PWi j 


1 


g' 


*r,*h(C,,t) ^ 1 




^e*U _ PWj 



(mod A^) 
(mod N) 
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e*U _PW: 






,e*[/_PW: 



(mod n) 



e*U _PW, 



e*U_PWi 



(mod N) 



= 1 (mod N)) 

Calculate yj (S_SK.) by using 5 's secret key S_SK. as below: 



/ (S_SK.) = UJD^ + E_T. (mod N) 



(2) 



In order to know when the service period of for U. is, Sj can derive it from Eq. 
(2) because 

E_T. =fJS_SK.) - U_ID. (mod N). 

S. then compares E_E. with its current date and time to see if the service 
period expires. If E_T.. • then S. will terminate the service to U. (i.e. U.'s 
access privilege to S- is deleted from now on). 



3 Security Analysis 

In this section, the security analyses of the proposed scheme are given. We will show 
that the proposed scheme can withstand the following possible attacks: 

(Attack 1) The impersonation attack 

An intruder may impersonate U. by forging a valid authentication message 
M' = {U_IDi, t', Cj', C^, f(l), f (2), .... yj (m), f (P)} and replaying it to the 
server. In order to pass the check of Eq. (1) in Step (6) of the server 
authentication stage, the intruder may calculate C/, C/ and !/_/?/ from the 
forged secret random number r/, f//s password U_PW'., login timestamp t', and 
CA's secret key d' as follows: 





(mod 


N) 


gU_PW,' ^ 


(mod 


N) 




(mod 


N) 



However, as long as the forged d' is used in computing U_R/ , Eq. (1) cannot be 
satisfied because our proposed scheme is constructed based on the security of 
the RSA scheme [17], which relies on the difficulty of factoring a large number 
into its prime factors. That is, if the intruder want to pass the verification of Eq. 
(1), then he/she must derive in the following form 

U (mod N) 
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But it is very difficult for the intruder to find d in our proposed RSA-based 
scheme. Consequently, the impersonation attack can be withstood successfully. 

(Attack 2) The replay attack 

Two possible replay attacks are considered in the following. First, an 
intruder intercepts an authentication message and tries to masquerade as the 
sender by replaying it without modifying any content of the authentication 
message; secondly, an intruder intercepts an authentication message and replays 
a forged authentication message modified from the original one. Since the 
authentication message is time-dependent, the first approach will be excluded 
from the check in Step (2) of the server authentication stage. To pass the check 
in Step (2) of the server authentication stage, the intruder must change the 
timestamp t to t’ so that t^^^^ - t’ > Delta T holds. Once the timestamp t is 
changed, the value Q must also be changed such that the intruder can pass the 
verification of Eq. (1) at the server authentication stage. That is, the intruder 
must find Q' to satisfy 

However, based on the above analysis for the impersonation attack and the 
difficulty of solving the intractable discrete logarithm problem [18, 19], it is 
impossible for the intruder to successfully forge a valid C/ to pass the server 
authentication without knowing f//s password U_PW.. Hence, the second 
approach will also be excluded from the check in Eq. (1). 

(Attack 3) Extending the authorized period of server service by users 
themselves 

When S.'s service period for U. expires, he/she should resubmit his/her 
application to CA for extending the authorized period of server service if he/she 
wants to continue the server service. We are convinced that it is absolutely 
impossible for U. to employ his/her original smart card to extend the service 
from Sj at the expiration of A's service. As described in Step (4) of the user 
registration stage, each server S. 's (lleq jleq m) service period E_T.. for U. is 
hidden in the interpolating polynomial /j(A] constructed by all servers' secret 
keys S_SKj, S_SK 2 , ..., and S_SK^. As long as U. does not get the correct S_SK., 
he/she cannot absolutely construct a valid interpolating polynomial to pass the 
server authentication by himself/herself. In other words, U. cannot succeed in 
passing the server authentication even though he/she attempts constructingyj'fA] 
to deceive S. by finding m forged S_SKf This is because he/she cannot pass the 
authentication in Eq. (1) owing to no ability to compute U_R. such that 
f'(U_IDJ = U_R. (mod N). Thus, the proposed scheme can prevent legal users 
from extending the authorized period of server service by themselves. 

(Attack 4) Intercepting the authentication message to obtain secret data 

In our proposed scheme, it is secure against the attack on intercepting Uls 
authentication message M = {U_ID., t, C, Q, f(l), f(2), f(m), f(P)} to 

obtain secret data. Based on the difficulty of solving the discrete logarithm 
problem, an intruder cannot derive the secret data r^ and U_PW. from C, and Q. 
On the other hand, since the intruder cannot derive the value P without knowing 
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S-s secret key S_SKj, he/she certainly cannot reconstruct the polynomial of 
degree m to get servers' secret keys only from these m points (2,f^(2)), 

and {m, f-(m)). 



4 Conclusions 

Password authentication is the most widely used authenticating technique because of 
its ease of implementation, user friendliness, and low cost. In this paper, we have 
proposed an efficient remote login authentication scheme for multi-server Internet 
environments using smart cards. The smart card plays an important role in our 
scheme. Using it together with a user's preferred password, the login request message 
is transmitted securely and the verification can be performed easily. For each 
legitimate network user, our proposed scheme can verify his/her single password for 
logging multiple authorized servers without constructing the password or verification 
table for authenticating login requests, and he/she can get service grant from multiple 
servers without repetitive registration to each server. Hence, we eliminate the threat of 
revealing the entire list of passwords, even if in ciphertext form, and obtain an 
efficient smart card based remote login authentication scheme for multi-server 
Internet environments. 

In addition, in our scheme the system can delete illegal users easily and periodically. 
The privilege of each legal user for multi-server access is only validated before some 
date. When the authorized period is exceeded, the smart card held by a user will lose 
its authority automatically. This means that the corresponding user will become illegal 
from then on. Furthermore, it is also difficult that a legal user tries to extend his/her 
authorized period by himself/herself. 

To ensure the security of sending the login request message over the public channel, 
the communication timestamp is provided in the authentication phase to withstand the 
potential attack of replaying a previously intercepted login reqnest nnder the 
assumption of universally synchronized clocks. Besides, we have also shown that the 
proposed scheme is secure against the attack on forging authentication messages. 
Therefore, it is believed that our proposed remote password authentication for multi- 
server Internet environments is practical in real applications. 
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Abstract. Routing in the Internet is based on the best-effort mechanism, 
wherein a router forwards a packet to minimize the number of hops to the des- 
tination. Also in the Internet, all packets are treated the same independent of 
their size. We propose the framework of NetLets to enable the applications to 
send data packets to the destination with certain guarantees on the end-to-end 
delay. NetLets employ built-in instruments to measure the bandwidths and 
propagation delays on the links, and compute the minimum end-to-end delay 
paths for data packets of various sizes. Based on our experiments, the paths se- 
lected by our system using the measurements are indeed the minimum end-to- 
end delay paths, and our method outperformed the best-effort mechanism based 
on the hop count. 



1 Introduction 

Routing is crucial in determining the end-to-end performance in the distributed com- 
puting applications over the Internet. Current routing protocols select the paths to 
minimize the number of hops to the destination. These protocols sometimes ignore 
viable alternate routes with higher bandwidths or less congestion, both of which are 
vital factors in deciding the end-to-end delays. Furthermore, one of the reasons for 
the unsatisfactory end-to-end path performance of the present Internet is the inade- 
quate measurement infrastructure. Paxson et. al [8] point out the need for infrastruc- 
ture to diagnose performance problems, measure the performance of network paths, 
and to asses performance of different Internet service providers. In this paper, we 
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employ the measurements but for the sole purpose of computing the minimum end- 
to-end delay paths. 

Quality-of-Service (QoS) guarantees are currently needed by a variety for applica- 
tions such as video conferencing and real-time distributed simulation experiments. 
To facilitate the growing number of such applications, new protocols and infrastruc- 
tures are being developed (MLPS, DiffServe and IntServe [3,6]) for ensuring end-to- 
end performance. The real question is that, given the infrastructure currently in place, 
is it practical to replace it all with a new one that offers QoS guarantees for end-to- 
end performance? The answer is no at least in the near future, but there can be mid- 
dle ground. This is achieved by incorporating daemons, called the NetLets, above the 
existing network layer to provide end-to-end performances beyond what is possible in 
the current Internet. NetLets will be deployed on certain intermediate routers (if 
possible) and on host machines. Two NetLets are connected via a virtual link if they 
are connected by an IP (Internet Protocol) path containing zero or more routers of the 
underlying network. In Figure 1 an example of the virtual network that overlays on a 
real network is shown. 



( E ►, G ) 



B ;k ►: D > 



Fig. 1: The shaded nodes house NetLets. A virtual link from node A to node F corresponds to 
the path A-B-D-F in the underlying network. 

We now show that the packet size has a strong influence on the end-to-end delay 
of routing paths. It is indeed possible for the same source and destination pairs to 
have different minimum end-to-end delay paths for packets of different size. Conse- 
quently, the routing policy based on minimizing the hop count is only sub-optimal in 
general. The size of an IP packet can vary from a minimum of 21 bytes (20 bytes of 
header and 1 byte of data) to a maximum of 65,535 bytes (20 bytes of header and 
65,515 bytes of data). Consider the network shown in Figure 2. The nodes A and D 
are the source and destination, respectively. The bandwidths on the links have been 
chosen to be either DSl or DSIC and all links are assumed to have a propagation 
delay of lOpsecs. Then, to a first order of approximation, the end-to-end delay of a 
<7 , 

link is proportional to —+ a , where <7 is the size of the data packet, B is the 

bandwidth on the link and d is the propagation delay of the link. In the table below 
the network diagram, the end-to-end delay of the routes for A to D for various packet 
sizes are shown. Clearly, for the minimum sized IP packet one should choose the 
path A-B-D, while for the maximum sized IP packet one should choose the path A-B- 
C-D. At the intermediate nodes, we have not included the queuing and processing 
delays, collectively called the node delays. For the maximum sized IP packet, we 
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choose A-B-D if the node delay at C is greater than 7msecs. The node delays are 
often much harder to estimate unlike the bandwidth and link delays. These delays can 
be handled by utilizing the probabilistic components to the end-to-end delays to de- 
rive similar conclusions but with a certain probability [11]. 






End-To-End Delay from Node A to 
D 


Packet Size (bytes) 


A-B-D Path 


A-B-C-D Path 


21 


182 | 0 ,secs 


190|0,secs 


65535 


506msecs 


499msecs 



Fig. 2. A network with A and D as the source and destination, respectively. The bandwidth 
and propagation delays on the links are indicated. The table indicates the end-to-end delays for 
minimum and maximum sized IP packet using different paths. 

As evident from the above example, protocols that ignore the available band- 
widths and propagation delays will produce routing paths with sub-optimal end-to- 
end performance in certain cases. We present a framework that accounts for the 
bandwidth and propagation delays: 

1. software instruments are used to measure the bandwidths and propagation 
delays on the links; 

2. bandwidth and delay estimates are used to determine the minimum end-to- 
end delay paths; and 

3. interfaces are provided for the applications to route data packets along the 
computed paths. 

Analytical basis for this approach is first provided in [11], where it was shown that 
measurements are sufficient (with a specified probability) to compute the paths with 
the minimum end-to-end delay, irrespective of delay distributions. It is important to 
note that it is not necessary to derive delay distributions for providing this type of 
end-to-end QoS, and one can achieve useful performances based on measurements 
alone. 

We compare the paths chosen by our technique with those chosen by the best- 
effort mechanism in an experimental setup of workstations that route data packets 
using the Internet Protocol. In the experiments, the paths selected by our system 
based on the measurements indeed have the minimum end-to-end delay. Our method 
outperformed the best-effort mechanism and in this sense, our results provide an 
experimental evidence for the analytical results in [11,15]. 
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2 Related Research 

Research on the end-to-end performance of path section protocols has received much 
attention in recent years. For example, Savage et. al. [17] in their Detour system point 
out various performance problems in the Internet including the inefficiencies both in 
the routing and transport layer protocols. Collins [5] also suggests the formation of a 
virtual network similar to the one proposed in this paper. There are many differences 
between our work and the Detour system. First, the Detour system does not explicitly 
minimize end-to-end delay between source and destination pairs. Second, the meas- 
urements mechanism suggested by the Detour system involves the use of utilities 
such as traceroute and ping. Our measurements (link bandwidth and link propagation 
delay) are obtained using the actual transport layer segments, namely TCP (Transport 
Control Protocol) segments. That is, the measurements are done using transport layer 
units that are actually routed on the network rather than control data packets sent via 
traceroute and ping programs. With the deployment of firewalls, the Internet Control 
Message Protocol (ICMP) traffic might be treated significantly different from TCP or 
User Datagram Protocol (UDP) traffic; for example, some firewalls disable responses 
to ping and provide no or misleading responses to traceroute. The TCP-based meas- 
urement scheme used here is vital to proving the analytical guarantees of [1 1,15], and 
no such guarantees can be given if only traceroute and ping measurements are used. 
Third, unlike the Detour system we do not modify the existing IP layer but direct the 
IP layer to forward the packets via the other NetLets. 

Paxson [7] has developed measurement tools for determining the end-to-end de- 
lay, packet loss, and actual routing paths on the Internet. The RFC 2330 [8] examines 
the framework for measuring IP performance metrics. Most of this research points 
out the sensitivity of the measurements with respect to the changes in clock. For 
example, in a time synchronized distributed system, the one-way link time can be 
measured by time-stamping a packet before sending and time-stamping it after it 
reaches the destination. Since such assumptions are non-plausible in a distributed 
system, Paxon [7] suggested techniques to adopt to arrive at a reasonably accurate 
packet delay measurements. While measurements in general are motivated by several 
purposes, such as traffic modeling and fault diagnosis, the goal of our measurements 
is limited to reducing the end-to-end delays. 

IP routing involves designating groups of nodes (routers and hosts) to form an 
Autonomous System (AS). The entire system of routers and hosts are grouped into 
different ASs, and communication between any two nodes within an AS is carried out 
using the Interior Gateway Protocol (IGP). The Border Gateway Protocol (BGP) 
treats each AS as a single node and provides the routing protocols between two nodes 
that are in different ASs. Both IGP and BGP use the hop count as a performance 
measure to determine end-to-end path [9], that is, given two paths between a source 
and destination pair, the path with fewer links will be chosen. 

To transfer messages, there are two basic routing mechanisms: circuit switching 
(also called pipelining) and packet switching based on the store-and-fonvard method. 
When data is delivered using circuit switching, bit streams of data are transferred at a 
fixed rate from the source to destination without buffering. Over the packet switched 




188 N.S.V. Rao, S. Radhakrishnan, and B.-Y. Choel 



networks, entire data is stored at every intermediate node before being forwarded to 
next node. The telephone networks belong to circuit switching, and the IP networks 
belong to packet switching paradigm. For path P, the path-delay D(P) is the sum of 
all propagation delays of links along the path. The end-to-end delay of a path P can 
be computed by the formula T = O ! B(P) + D(P) in the circuit switching, where 
B(P) is the path-bandwidth which is appropriately computed based on the routing 
mechanism. Since in circuit switching the data transfers along the route is with a fixed 
rate, B(P) is the minimum bandwidth of link along the path. In the case of packet 
switching, since the incoming data is stored temporarily at each node and then trans- 
mitted to an outgoing link, transmission time of CT / B(e) is required at each node v, 
where e is the outgoing link of v. Thus, for packet switching mechanism, the path- 
1 

bandwidth B{P) is , where P is the routing path and e is the link on P [14]. 

T— 

In summary, the routing mechanism has a significant effect on the end-to-end delay 
and must be considered in computing the optimal end-to-end delay paths. 

The well-known quickest path problem is to find a routing path in a network G 
such that the end-to-end delay time required to send (7 units of message from a 
source to a destination is minimum. Chen and Chin [4], Rosen et. al. [16], Rao and 
Batsell [10-12], and Bang et. al. [2] studied the quickest path problem using the cir- 
cuit-switching mode. However, since the store-and-forward transfer mode is used to 
send message in the Internet, the classical quickest path algorithm cannot be adapted 
to an IP network. In this paper, we take into account bandwidth and propagation de- 
lay of links to minimize the end-to-end delay of the path in an IP network. We com- 
pare the method based on minimum number of hops with our method based on mini- 
mum end-to-end delay by utilizing the estimated bandwidths and propagation times 
of links. 



3 Estimation of Bandwidth and Propagation Delay 

We measure the bandwidth of a link (a, b), which can be a virtual link or a physical 
link in the underlying network, using the following steps: 

1. Generate various sizes for a TCP segment at node a; 

2. For each size s. construct a number of TCP segments of size s and send them 
to node b. 

3. Node b upon receiving each TCP segment simply echoes it back to node a. 

4. The round-trip delay is measured for each segment and the average is com- 
puted. The end-to-end delay is the round-trip delay divided by two. 

Once the average end-to-end delay is determined for messages of different sizes, 

(7 , 

and the linear regression is applied to determine the line ha that fits the points as 

B 

shown in Figure 3. From this computation, we obtain the “effective bandwidth” B of 




NetLets: Measurement-Based Routing for End-to-End Performance over the Internet 189 



the link. To determine the propagation delay (the time for a minimum size message 
to travel along the link), we sent a minimum size message containing 21 bytes several 
times and averaged it. 



100200 ... 3000 

Message Size 






Slope = l/Bandwidth 



100200 ... 3000 

Message Size 



Fig. 3. Determination of the end-to-end delay experienced hy transport layer segments 
of various sizes. 



4 NetLets Framework 

NetLets provide a software interface that the applications use to route messages via 
the minimum end-to-end paths (more details can be found in [15] on the overall 
framework of NetLets). Based on the bandwidth B and the propagation delay D of 
each link in the virtual network, the minimum end-to-end delay path for a message of 

O , 

particular size a is determined by assigning to each link the weight ha, and 

B 

computing the shortest path in the resulting network using the one-to-all Dijkstra’s 
shortest path algorithm [9,13]. Clearly, given any source and a set of messages of 
sizes O), O), O), we can construct the shortest path trees with respect to each mes- 

sage size and if two trees are the same we can eliminate one of them. Once the short- 
est path trees are known, the routing tables are constructed. We perform this opera- 
tion for all network nodes that house NetLets. 

The routing table contains for each destination the next hop IP address of the node contain- 
ing the NetLets modules. All NetLets modules communicate using predetermined port num- 
bers. The NetLet after receiving a message from the local application sends the datagram using 
the designated port number and the IP address in the routing table. NetLet upon receiving the 
message determines if the packet is destined for it or for any nodes that it can route and per- 
forms the necessary actions. 



5 Experimental Results 

For experimentation we selected two virtual network topologies as shown in Figure 4 
(a) and Figure 4 (b). As evident in topology 1 (Figure 4(a)), there are four different 
paths from source to destination node with the minimum number of hops, namely 4. 
Lfsing the link weight of (O / B + D) for each link, the minimum end-to-end delay 
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path from source L to the destination N is J ^ N . Also, the paths, 
L^J^T^N, L^I^T^N, L^J^V^N , and L^I^V^N , are 
paths with minimum number of hops. For topology 2 (Figure 4(b)), paths with mini- 
mum number of hops are I ^ J ^ N, I ^ J ^ N, and I ^ L^T N, 
where I and N are source and destination, respectively. The minimum delay path is 
J^T^N which also has the minimum number of hops. 

With respect to the network topology 1, Figure 5(a) and 6(a) represent the ob- 
served end-to-end delay for paths computed by sending messages of different sizes, 
and Figure 5(b) and 6(b) show the end-to-end delay obtained by our method (namely, 
the summation of the link weights on each of the paths). Even though the observed 
end-to-end delay is different in magnitude from the calculated end-to-end delay, the 
two methods have chosen the same path from source to destination as the minimum 
end-to-end delay path. 




Fig. 4(a). Virtual topology 1 containing six nodes with bandwidth in bytes per second and 
propagation delay in seconds indicated on the links. 




Fig. 4(b). Virtual topology 2 containing six nodes with bandwidth in bytes per second and 
propagation delay in seconds indicated on the links. 
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Fig. 5(a). For virtual topology 1 this graph contains the observed end-to-end delay for various 
paths. 




Fig. 5(b). For virtual topology 1 this graph contains the calculated end-to-end delay for various 
paths from source to destination. 




Fig. 6(a). For virtual topology 2 this graph contains the observed end-to-end delay for various 
paths. 
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Fig. 6(b). For virtual topology 2 this graph contains the calculated end-to-end delay for vari- 
ous paths from source to destination. 
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Fig. 7. Comparison of observed (actual) and computed end-to-end delay of the path 
LJTN in topology 1 . 



In Figure 7 we show the observed and computed end-to-end delays. While there are 
clear differences in delay times, the minimum end-to-end delay paths chosen by com- 
putation and experimentation are exactly the same path. 



6 Conclusions 

We presented experimental results on the performances of the routing paths, compar- 
ing the minimum hops method with our method that computes minimum end-to-end 
delay paths based on measurements. Our results showed that the minimum delay path 
computed based on the estimated bandwidth and delays outperformed the path with 
minimum number of hops. This is because the end-to-end delay depends on the mes- 
sage size, propagation delay and bandwidth. We also proposed a software framework 
based on NetLets [14] that can be used for routing with probabilistically guaranteed 
performance. 

Future work could involve utilizing other regression estimation methods [11,15]. 
Also, in terms of experimentation using Internet, NetLets were used in [14] for realiz- 
ing two-paths between nodes that are geographically separated by thousands of miles. 
Our system can be expanded to incorporate more nodes and also the nodes that are 
more widely distributed over the Internet. NetLets are complementary and upward- 
compatible with QoS mechanisms such as MPLS, DiffServe, and IntServe as well as 
adapted and/or optimized versions of transport mechanisms such as auto-tuned 
TCP[18]. It would be interesting to see if existing NetLets can be enhanced to exploit 
these mechanisms to provide end-to-end performance beyond what is possible in 
current IP networks. Also, active networks [1] can be exploited to provide more in- 
formation and control to the NetLets by attaching measurement and routing codes to 
the messages. 
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Abstract. The paper presents some socio-economic and technical issues that 
arise from the design of an open, multimodal end-to-end tracking & tracing 
system. An outline of the general background and the project’s goals is 
followed by a discussion of the associated socio-economic aspects. User 
requirements are also addressed, with respect to the business context of the 
logistics environment. Subsequently, a brief overview of the project’s technical 
design choices and is given, followed by a more detailed discussion of the 
individual components of this architecture. 



1 Background and Goals 

Europe has the most advanced transport infrastructure of the world, including a very 
dense road network, a modern rail network and an air network that covers the entire 
continent. Europe also has very advanced communication networks, consisting of 
dense, high-quality fixed networks, satellite coverage, and wireless communication 
networks, such as GSM and - in the near future - GPRS (General Packet Radio 
Service) and UMTS (Universal Mobile Telecommunication System). Unique 
opportunities will be generated when these two infrastructures are connected to each 
other. 

Transport and logistics today have evolved into a high-technology industry. 
Distribution is no longer about moving cargo over road or via air from A to B, but is a 
complex process based on intelligent systems for sorting, planning, routing, and 
consolidation that supports faster transportation, different transportation modes, 
fallback scenarios in case of failures, value added services such as time sensitive 
deliveries and tracing of products throughout the supply chain or transport network. 

Many large logistics companies have developed solutions for delivering these 
services in order to meet the requirements of their customers and to improve their 
services. Smaller companies, however, cannot afford these investments and are 
mainly active in the ‘old’ point-to-point transportation market, or co-operate with the 
larger companies, using their respective systems. 

The companies that have the necessary information systems in place to participate 
in the market for high-end transport solutions normally offer their customers methods 
for tracking and tracing their consignments. Even though many customers would 
benefit from using this information in their own information systems, only few of 
them are doing this today because of the large investments in their systems required to 
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adapt to the proprietary interfaces of the transport companies. However, these systems 
typically have two major drawbacks: 

- They do not normally work across company boundaries. 

- They do not provide accurate ‘life’ information about location and, particularly, the 

status of individual units or items. 

That is, continuous information about the current position or status of transport 
goods (in the sense that the geographic position can be queried at any time) is not 
commonly available today. Existing solutions are typically based on scanning bar 
codes at process or control points. Moreover, information at item level is not normally 
available either. Typically, this information is provided - if at all - at a vehicle or 
container level only. Furthermore, very few companies have true global or even 
European coverage. In daily business, products are frequently shipped by 
subcontractors of the transport company, which frequently means that tracking and 
tracing at least becomes much more difficult, and typically will no longer be possible 
at all. Only in a few cases do carriers exchange tracing information, but in most cases 
the costs for adapting the proprietary systems to each other are prohibitive. 

The key idea of the ParcelCall project is to provide relevant services on top of 
TCP. This allows considerable freedom with respect to the underlying network 
protocols, potentially including e.g., GSM/GPRS, ISDN, UMTS, and IP. Easy 
integration into legacy systems, operated by the individual carriers, to the new 
information infrastructure is another key design criterion. Seamless interoperation 
between these systems on the one hand and the new tracking & tracing system has to 
be guaranteed. 

Furthermore, and in addition to the track & trace services typically provided today, 
ParcelCall will enable [ 1 ] : 

- item level tracking & tracing 

That is, the granularity of the track & trace function will be adjustable, from the 

individual item (e.g., a parcel) up to a container or a vehicle, 

- ‘near-real-time’ (continuous) tracking 

Status and position information will be made available at anytime (not just at e.g. 

terminals or hubs). 

The remainder of the paper is organised as follows: Chapter 2 briefly introduces 
the socio-economic context of the project. An overview of the user requirements from 
a business perspective is given in chapter 3. Subsequently, chapter 4 introduces the 
overall system architecture and its individual elements. Finally, some conclusions are 
presented in chapter 5. 



2 The Socio-economic Context 

The socio-economic environment within which freight forwarders and logistics 
companies operate is briefly outlined in this chapter. 

The success of a new technology depends on more than simply its technical 
efficacy; it must also be matched with its socio-economic context. In some cases this 
means tailoring technology to the existing environment, in others the market and 
context may need to be ‘created’, alongside the technology, by the technology’s 
developers. Most obviously a technology must address the requirements of its various 
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users. Typically, it is important that current needs, as seen in existing business 
practices, are taken into account. However, although existing practices provide a 
starting point, gaining the full benefit of new technology often depends on its more 
radical application. 

Tracking & tracing systems need to address the requirements of two main types of 
transportation: business-to-business and business-to-consumer. While the former 
increasingly hinges on efficient logistics management, a key issue for the latter, 
especially as regards the growth of e-commerce, is customer satisfaction. 

High quality tracking and tracing of parcels matters for business-to-business 
transportation because of the trend towards inventory reduction. The speed, reliability 
and timeliness of delivery have increasing commercial salience both in procurement 
and in the quality of service offered by a supplier. Enhanced logistics management 
based on Just-in-Time, Vendor Managed Inventory or similar approaches can not only 
minimise stocks held, but can also involve outsourcing logistics management either to 
the supplier or to a specialist logistics operator. With e-commerce, boundaries 
between different ‘stages’ in the supply chain may become eroded. Distributors may 
take on extended roles; for example, in fulfilment and final assembly. 

Improvements in tracking and tracing can also play a significant part in eliminating 
one of the problems faced by Internet shopping - reliable, time-assured delivery, 
tailored to customer requirements. Although not unique to Internet shopping, 
heightened customer expectations along with internet/WAP access provide an 
opportunity for improved customer service using more accurate parcel tracking 
technology. 

While fulfilling these business applications is central, it is also important to 
recognise that there are a variety of other socio-economic issues that may affect the 
technology’s success. Security and confidentiality may be important. Above all, a 
technology that involves inter-organisational data exchange depends heavily on the 
success of standardisation efforts and on the willingness of firms to work together. 
These issues may affect the technical choices adopted in the design and configuration, 
as well as the commercial strategies for its promotion. Strategic thinking on these 
lines is embedded in the architecture and strategy of the ParcelCall project [2], [3]. 



3 User Requirements 

The user requirements discussed in this section relate rather more to the underlying 
busienss processes than to a technical architecture or a specific realisation. 

Attaining technical objectives will be of little significance if the technology itself is 
not widely implemented. Although individual companies could benefit from its local 
adoption, a system’s full potential lies in the development of a standardised approach 
that can gain general acceptance in the industry. Success will not depend simply on 
the development of the ‘best’ technology; equally important is the development of a 
constituency of users. The system will depend upon aligning expectations to ensure 
that a sufficient number of key users (critical mass) will be convinced to take part. It 
is crucial to convey that this represents the way forward, to win these kinds of 
commitments. 

Thus, it is crucially important to recognise the diversity of players involved, with 
their very different commitments and needs. The development of a new Inter- 
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Organisational Network System may involve an uneven distribution of costs and 
benefits between these players [4]. In particular, it is important to ensure low barriers 
to entry - particularly for those players for whom a sophisticated track & trace system 
does not offer significant immediate benefits or strategic importance. 

System senders, receivers, and carriers are the main users of a tracking & tracing 
systems. Their respective requirements are discussed in this section. Other users 
include Transport Broker, Packaging Services, Collection and Delivery Services, 
Depot/Hub/Terminal operators, and Vehicle Drivers. 

Individual senders will typically take the parcel to a collection office. Home 
collection could also be possible, and it should be possible to arrange this service 
though InternetAVAP access. The sender would like the options of email confirmation 
of parcel delivery, and Internet/WAP access to transit status and estimated time of 
arrival. 

The requirements of company senders will vary according to their business 
practices, and companies that are supplying goods to individuals may have similar 
requirements to individual senders. They will want to receive as much status 
information as possible because this can then be provided to their customers, 
providing value added to their service. Likewise, reliability in delivery times (or 
flexibility in rearranging them) is important, as would be the ability to confirm that 
the parcel has been received by the appropriate individual. 

Individual receivers want to know when a parcel will arrive so that they can ensure 
that someone is there to receive it. Internet/WAP access and email messages can 
provide an attractive customer service, and is, for example, likely to be an important 
aspect of the development of internet shopping. This service could include ‘real time’ 
information on the parcel’s movement, with updates in the estimated time of arrival 
being the key feature. 

In the case of corporate receivers, their dealings with senders will often be part of 
long-term supply relationships. For example, in B2B e-commerce, many 
manufacturing receivers may use EDI to send orders or call-offs based on long-term 
supply agreements and these may be generated directly from their internal systems. 
ID tags should have the potential to satisfy any of the receivers’ internal requirements 
for tracking the parcel. 

Companies providing express delivery, freight-forwarding and logistics 
management will be the main users. Their customers (senders and receivers) may 
have certain data requirements (as noted above), but mainly they will want a high 
level of performance and service to be provided to them seamlessly and transparently. 
While a number of large, integrated carriers mainly use their own transportation 
(planes, trucks, etc), even they typically need to subcontract the physical carriage 
some of the time. 

Parcels must carry an ID tag. As with existing track & trace systems, this tag will 
be read at each control point - typically the hand-over between transportation units 
when arriving at or leaving a depot. With active tags, the reading of tags can be 
continuous, or at customer request rather than solely at handover points. 

The carrier also requires regular feedback from moving transport units where this 
is possible. This information will comprise location and transport unit status (is it on 
time? what is the expected delay?), along with all the parcels carried and their routing, 
destination and estimated time of arrival. 

Delays or route deviations will be identified by the system. It should also be 
possible for deviations to be manually notified by the transport operator. If ‘thinking 
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tags’ are used, then any undesirable deviations in the status of the parcel (in 
temperature, for example) should result in alerts to both the carrier’s main system and 
directly to the transport unit operator so that remedial action may be taken as soon as 
possible. Issues arise about whether to standardise these messages and, if proprietary 
and encrypted messages are being transmitted, whether carriers will feel happy to pass 
on to third parties information that they cannot themselves understand. 



4 The Architecture 

This chapter discusses the system architecture (see Figure 1). An overview of the 
specific architecture designed by, and deployed within, the ParcelCall project is 
followed by a description of the individual components. 

A Mobile Logistics Server (MLS), located on board a vehicle, collects information 
on the individual items, including position and status. The former is obtained via the 
Global Positioning System (GPS), ‘intelligent’ tags are utilised to collect the latter. 
These ‘Thinking Tags’, have also been developed within the project, can form ad-hoc 
networks that can be applied to self-adapting hierarchical packing schemes or to 
active status monitoring of critical freight contents. Alarm messages will be actively 
generated if, e. g., an item enters a critical state (temperature, humidity, pressure, 
acceleration, etc.). 




Fig. 1 : The System Architecture 



The MLS sends the compiled information to a Goods Tracing Server (GTS). Every 
participating company (e.g., freight forwarders, logistics service providers, fleet 
operators, etc) needs to install at least one GTS which also serves as the interface 
between the respective internal IT system and the track & trace service. Thus, the set 
of GTSs forms a highly distributed data base holding the information available to the 
end-users (subject, of course, to appropriate access rights and successful 
authentication). The individual servers are interconnected via public networks (as e.g. 
the Internet or ISDN). It should be noted that even very small companies which do 
not have their own tracking and tracing system can utilise the ParcelCall service, as a 
GTS (typically a PC), one (or a few) MLSs, and some ‘thinking tags’ are pretty much 
the only additional pieces of hardware required. Customers can access the system 
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through a ‘Goods Information Server’ (GIS), whose tasks also include authentication 
and access control. The elements of this architecture are discussed helow in more 
detail. 



4.1 The Mobile Logistic Server 

To provide the required services each transport unit need to be equipped with a 
Mobile Logistic Server (MLS; see Figure 2) which keeps track of the goods within 
that unit. A transport unit may, for instance, be a truck, a freight wagon, or a 
container. 




Fig. 2 : Structure of an Mobile Logistic Server 



Each such unit may contain transport goods or other units, thus potentially forming 
a hierarchy of transport units. Since each unit might have a MLS, the transport unit 
hierarchy causes an equivalent MLS hierarchy. Except for the top level MLS (root of 
the hierarchy tree) all MLSs communicate with their respective superior and 
subordinate MLSs. 

Except for the top level MLS (root of the hierarchy tree) all MLSs communicate 
with their respective superior and subordinate MLSs. 

Erom an MLS’s point of view there is no difference between a tag and an MLS. A 
‘normal’ MLS only needs to implement an itemQ interface. Only a top level MLS 
must implement an item interface and a GTS interface. 

Both ‘thinking tags’ and MLSs store a unique item identifier, the item’s destination 
address, and certain other information, as e.g., constraints on e.g. temperature, shocks, 
or humidity. If a threshold of one of these constraints is exceeded an event will be 
generated and passed to an superior MLS, which forwards the event to its associated 
GTS. The GTS, in turn, forwards the event to the carrier’s IT system (which may or 
may not react upon the event). If a transport unit within this message chain has a 
control system (e.g., for a refrigeration unit), this control system can register with its 
associated MLS to receive events. Thus, transport units can also react immediately to 
certain events. This is of crucial importance as a transport unit, for instance a vehicle, 
might be disconnected from the carrier’s IT system (due to, e.g., poor GSM coverage) 
and thus cannot receive any instructions. 

Unloading and re-loading of transport units changes the MLS hierarchy. To keep 
track of such changes items are scanned while being loaded. As the carrier’s IT 
system plans the loading process in advance the ParcelCall system can check the 



Both goods and transport units within a container are subsequently referred to as ‘items’. 
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loading procedure while scanning the items. Therefore, each MLS can receive a 
loading list of ‘its’ items in advance from a GTS. While an item is scanned the MLS 
checks whether or not it is on its loading list and generates an alarm message if a 
‘wrong’ item has been detected. Likewise, the responsible GTS is informed when the 
loading procedure has been completed successfully. 



4.2 The Goods Tracing Server 

The GTS network forms the backbone of the ParcelCall system (see Figure 3). It 
interconnects and integrates the individual ParcelCall servers on the one hand, and the 
individual carriers’ IT systems on the other. A GTS comprises of two databases: A 
Home Database and a Transitory Database. The Home Database contains item 
information, e.g. an item’s current position, its expected time of delivery, etc. The 
Transitory Database holds the MLS hierarchies. 




Fig. 3: GTS Internal Architecture 



The Home Database. As the ParcelCall system must scale it is impossible to store 
information of each item within each Home Database. A GTS is responsible for a 
fixed number of top level MLSs (top level MLS cannot move). When an item enters 
the ParcelCall system for the first time it must be either registered with the GTS or 
with an MLS. The latter forwards registration information to the local GTS, which 
stores the item status information and the identification of the responsible top level 
MLS in its Home Database. Note that the top level MLS which is currently 
responsible for that item might belong to another GTS. In that case it is also necessary 
to store the identification of this GTS. Thus, the Home Database contains information 
of those items which have entered the system within the responsibility of the GTS. 

A GTS receiving information about a certain item which is not registered at its 
Home Database forwards this information to the responsible GTS (the ID of which is 
stored in the tag). Having both the unique ID of an item and the ID of the responsible 
GTS it is straightforward to retrieve information about this transport good. 



The Transient Database. It contains several MLS hierarchies (one for each top level 
MLS). To request instant status information of a certain item, the request must be 
routed through the MLS hierarchy. Having obtained the ID of the responsible top 
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level MLS it is straightforward to retrieve the required routing information from the 
Transitory Database. Note that the Transitory Database has one entry for each item 
which is currently under control of the GTS. 



The Carrier’s IT System. Among others, the GTS has an interface to a carrier’s IT 
system. This system provides delivery plans to the ParcelCall system which are used 
to establish the MLS hierarchy. On the other hand, the GTS forwards status 
information and alarm messages to the carrier’s system. 



4.3 The Goods Information Server 

The Goods Information Server (GIS; Figure 4) provides customers with status 
information about their transport goods. To this end a GIS connects to a carrier’s GTS 
to retrieve this information. Basically, the GIS displays the information received from 
the GTS network to the user. Therefore, it implements a multimedia converter which 
allows conversion of different. The GIS also performs user authentication and 
manages the access control. 

To retrieve information about a certain transport good a user authenticates to the 
GIS and provides the ID of the good as well as the ID of the responsible Home 
Database. Having both IDs it is straightforward to retrieve the information from the 
GTS network. 



End User 
Devices 



GISAPI 




GISManager 


IB 




Authentication Manager ^db J 


Presentation Manager 


Communicatton Manager 


TCP/IP 


SMS 


WAP 


GSM/GPRS 



Fig. 4: The GIS Architecture 



4.4 Passive and Active Tags 

Passive RFID (Radio Frequency Identification) tags are available today at moderate 
costs, and can easily be integrated into labels holding e.g. bar code information. Due 
to the costs (compared to simple printed tags) and infrastructure requirements 
(printers, readers), this technology has yet to gain widespread acceptance for tagging 
of short life-cycle products and low value transactions. However, we believe that it is 
only a matter of a few years before RFID tags will play an active role in global 
markets. Static RFID tags - with limited data capacity and read-only access - can 
already be printed using standard printers with special ink, without the need of 
integrating any hardware (chips). RFID tags with read/write capability and a capacity 
of a few hundred bytes are available as low-cost one-chip solutions. 
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Complementing bar code labels with RFID tags will enable automatic 
identification ‘on the fly’ without the need for manual consignment handling and 
label scanning. As a further step, ‘Thinking Tags’ could be used instead of passive 
RFID tags. Such tags, which have been developed within the project, combine active 
short-range communication capabilities with sensing, memory, and computing power. 
Key issues in their design include low power consumption and low costs. 

Thinking Tags will offer opportunities far beyond the mere transmission of static 
identification information, including, but certainly not limited to: 

• continuous measuring and monitoring of environmental conditions (temperature, 
humidity) for sensitive shipments (e. g., frozen food) at individual item level, 

• active alerting of the owner of a shipment in case of an alarm, i. e., deviation from 
the planned transport route, inadequate environment conditions, etc., 

• recording of the history (location, environmental conditions, status) of a shipment 
in order to provide evidence in case of liability issues. 



5 Conclusions 

In this paper we presented the ParcelCall approach towards an open architecture for 
tracking and tracing in transport and logistics. 

The de-centralised architecture has several attractive features with respect to the 
identified requirements. Most importantly, this architecture scales extremely well; it is 
no problem to install an additional server if need be. Almost as important, there is no 
need to modify existing corporate IT infrastructures. The only thing that needs to be 
done is specify and implement an interface between the infrastructure and the GTS. If 
required, incoming information (from the MLS) can first be processed internally 
before it is made available to the public via the GTS network (for instance, if exact 
location information must not be made available for security reasons). Moreover, 
small companies can compete on a more level playing field. 

Internal details, such as change of transport mode or use of a sub-contractor are 
hidden from the end-user, to whom a virtual global delivery system is presented. 
(Mobile) end-users (i.e., consignors and consignees) can obtain information about a 
consignment from the Goods Information Server (GIS). The GIS holds the individual 
user profiles, checks and verifies a user’s identity, forwards the query to an 
appropriate GTS and returns the response to the user’s current end system. 

We believe that customers will benefit from improved information on their 
shipments; their potential benefits include, but are not limited to improved planning, 
and better management of supply chains and inventories. 

Likewise, the European transport and logistics industry will greatly benefit from a 
unified architecture for the exchange of continuous tracing information. It will enable 
the deployment of new products and services and the improvement of existing ones. 
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Abstract. This paper proposes an architecture for distributed man- 
agement of upper layer protocols and network services called Trace. 
Based on the IETF Script MIB, the architecture provides mechanisms 
for the delegation of management tasks to mid-level managers, which 
interact with monitoring and action agents to have them executed. The 
paper introduces PTSL {Protocol Trace Specification Language) , a graph- 
ical/textual language created to allow network managers to specify proto- 
col traces. The specifications are used by mid-level managers to program 
the monitoring agents. Once programmed, these agents start to moni- 
tor the occurrence of the traces. The information obtained is analyzed by 
the mid-level managers, which may ask action agents for the execution of 
procedures (Perl scripts), making the automation of several management 
tasks possible. 



1 Introduction 

The use of computer networks to support a growing number of businesses and 
critical applications has stimulated the search for new management solutions that 
maintain not only the physical infrastructure, but also the protocols and services 
that flow over it. The popularization of electronic commerce (e-commerce) and 
the increasing use of this business modality by companies, for instance, imply 
using the network to exchange critical data from the organization and from its 
customers. Protocols and services that support these applications are critical 
and, therefore, need to be carefully monitored and managed. 

Not only critical applications require special attention. New protocols are 
frequently released to the market to support an increasing set of specific func- 
tionalities. These protocols are quickly adopted by network users. As a result 
of this fast proliferation, weakly-tested and even faulty protocols are dissemi- 
nated to the network consuming community. In several cases these anomalies, as 
well as the miscalculated use of resources, are the cause of network performance 
degradation and end up unnoticed. 
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We believe that most of the research carried on so far try to provide mecha- 
nisms to guarantee higher availability and performance for networks (e.g. Hood 
and Ji work on proactive fault detection 0). While solutions to manage physical 
network infrastructure are established and tested, it is still needed to investigate 
ways to provide effective management of applications and protocols. 

Existing management tools are not completely prepared to allow the moni- 
toring of these new applications and protocols. Most of the tools only allow the 
monitoring of a closed set of them. The ability to observe new ones depends on 
the firmware update of the monitoring hardware (e.g. RMON2 probes |^) or 
on the programming in low level languages as the extensible probe architecture 
proposed by Malan and Jahanian P). Due to the complexity of the task, most 
network managers neglect this possibility. 

In some approaches it is possible to recognize and count packet flows spec- 
ified by simple filtering rules (e.g. tcpdump-like filters used by ntop pj) or by 
descriptive languages such as SRL j0|, used by NeTraHet [Z|. However, these fil- 
tering languages lack constructors that allow a rule to be defined as a sequence of 
packets each with specific filtering options, making it impossible to accomplish 
time-based or correlated analysis of flows. 

Other solutions such as Tivoli Enterprise [S| are intrusive, since they require 
that developers insert specific monitoring procedure calls while developing ap- 
plications. This approach is only suitable for applications developed in-house. It 
cannot be used to manage proprietary protocols (e.g. web browsers and servers 
and e-mail client and servers). Besides that, one must invest on personnel train- 
ing to use the monitoring APIs. 

Regarding the type of information gathered by monitoring engines, some 
approaches such as the IETF RMON2 MIB (Remote Network Monitoring Man- 
agement Information Base version 2) store, for a pre-defined set of high-layer 
protocols supported by the probe, the number of packets sent/received by a 
host or exchanged by host pairs. Gaspary et al. describe in the advantages 

and limitations of the RMON2 MIB. One of the RMON2 weaknesses is that it 
does not store any information related to performance, but it has been discussed 
by the Remote Network Monitoring group at the IETF El 

Finally, we should point out that many management tools are limited to 
monitoring m and the network manager has to take actions manually when 
unexpected behaviors from these protocols are observed. 

In this paper we present Trace, an architecture for distributed management 
of enterprise networked applications, high-layer protocols and network services 
based on the IETF Script MIB ^3|. Through a graphical and textual lan- 
guage based on finite state machines, the network manager defines protocol 
traces to be observed. These specifications are readily received by one or more 
programmable agents that immediately start to check whether a defined trace 
occurs or not. The observation of these traces in the network traffic triggers 
actions, which are also determined by the network manager. 

The paper is organized as follows: section 2 describes the language to specify 
protocol traces. In section 3 the architecture is presented. Section 4 illustrates 
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how to accomplish fault management using the architecture. In section 5 we 
present a summary and concluding remarks. 



2 Protocol Trace Representation Using PTSL 



In this section we propose PTSL {Protocol Trace Specification Language)^ a 
graphical and textual language for the representation of high-layer protocol 
traces. The languages are not equal. The textual one makes the complete rep- 
resentation of a trace possible, including the specification of both the state ma- 
chine and the events that trigger the transitions. On the other hand, by using 
the graphical language one can graphically represent the state machine but only 
label the events that trigger the transitions. 



2.1 Organization of a Specification 

The textual specification of a trace begins with the keyword Trace and ends 
with EndTrace. Initially, the manager may describe some optional items to the 
specification (see figure^ lines 2-7). Next, it is broken down into three sections: 
MessagesSection (lines 8-10), GroupsSection (lines 11-13) and StatesSection 
(lines 14-16), where messages to be observed, grouping and state machines that 
describe the trace are respectively specified. 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 



Trace “Successful WWW access" 

Version; 1.0 

Description; WWW access with 200 response. 
Key; HTTP, 200, OK 
Port; 80 

Owner; Luciano Paschoal Gaspary 

Last Update; Fri, 23 Sep 2000 15;15;03 GMT 

MessagesSection 

EndMessagesSection 

GroupsSection 

EndGroupsSection 

StatesSection 

EndStatesSection 

EndTrace 



Fig. 1. Schematic representation of a textual specification. 



If the trace to be monitored belongs to a single application-layer protocol then 
the network manager may specify the TCP or UDP port number using the Port 
parameter (line 5). It will simplify packet classification during the monitoring 
phase. 
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2.2 State Machines 

The trace of a protocol is defined through a finite state machine. The network 
manager may define a model to monitor just a part of or the whole protocol, or 
interactions that comprehend more than one protocol. Figure El shows two trace 
examples. In the first example (a), the manager is interested in monitoring the 
successful accesses to a WWW server. The trace shown in (b) does not describe a 
single protocol; it is rather made up of a name resolution request (DNS protocol), 
followed by an ICMP Port Unreachable message. This trace occurs when the 
host where the service resides is on, but the named daemon is not running. 




(b 



Fig. 2. Graphical representation of a trace, (a) Successful WWW request, (b) DNS 
request not replied because named daemon is not executing. 



As one can see states are represented by circles. The initial state has the label 
idle associated to it. The final state is represented by two concentric circles. 
In both examples the initial and final states are the same (idle). Transitions 
are represented by unidirectional arrows. A continuous arrow indicates that the 
transition is triggered by the client host, whereas a dotted arrow denotes that 
it is caused by the server host. The text associated to a transition only labels 
the event (specified as a message or grouping in the textual language) that will 
trigger it. It means that the whole specification of a transition only can be done 
using the textual language. The graphical representation of the state machines 
shown in figureElcan be mapped to the textual specification presented in figureEl 

2.3 Transitions 

In addition to making a high-level representation of traces, it is necessary to 
describe what causes the change of states. Before describing the adopted so- 
lution, it is important to highlight that high-layer protocols are specified in 
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FinalState; idle 




FinalState; idle 


State idle 




State idle 


“GET" GotoState 2 




“DNS request" GotoState 2 


EndState idle 




EndState idle 


State 2 




State 2 


“HTTP/1.1 200" GotoState idle 




“ICMP message" GotoState idle 


EndState 2 




EndState 2 



(a) (b) 



Fig. 3. Textual representation of state machines. 



many different ways. Larmouth classifies them as character or binary-based m 
Character-based protocols are defined as a set of text lines coded in ASCII (e.g. 
HTTP and SMTP). Binary protocols, on the other hand, are defined as strings 
of octets or bits (e.g. TCP). 

Considering the differences between both protocol types, we propose state 
transitions to be represented by a positional approach. Taking the example 
shown in figure Ek, we present (see figure EJl) how to represent the transition 
HTTP/1.1 200. 



1 Message "HTTP/1.1 200" 

2 MessageType; server 

3 MessageTimeout; 5000 

4 //OffsetType Encapsulation FieldNumberVerb Description 

5 FieldCounter Ethernet/I P/TCP 0 HTTP/1.1 “Protocol version" 

6 FieldCounter Ethernet/IP/TCP 1 200 “Successful access" 

7 EndMessage 

(a) 



Message “DNS request" 

MessageType; client 

// OffsetType Encapsulation FirstBit NumberOfBits Verb Description 
BitCounter Ethernet/IP/UDP 16 1 1 “Field QR" 

BitCounter Ethernet/IP/UDP 17 4 0000 “Field OPCODE" 
EndMessage 



(b) 



Fig. 4. Representation of (a) Character-based and (b) Binary protocol fields. 



As the transition is expected to be triggered by the server host, one must set 
the MessageType field to server (line 2). Since both protocol fields (HTTP/1.1 
and 200) belong to a character-based protocol, the search for their positions 
within the packets is made by fields (FieldCounter, lines 5-6). In this example, 
HTTP/1 . 1 is the first string that appears on the message and therefore its offset 
is 0 (third parameter in line 5). The second string to appear is 200 and its offset 
is 1 (line 6). For each protocol field defined in a message it is also necessary to 
inform where to look for it (encapsulation Ethernet/IP/TCP, lines 5-6). 

When the transition is caused by a binary protocol, the offset is presented 
in bits (BitCounter). In this case, it is necessary to inform where the field 
starts (FirstBit) and the number of bits to be observed from this offset on 
(NumberOfBits). A standard DNS request can be recognized by two fields: QR 
(when set to 1 indicates a request to the server) and OPCODE (when set to 0 
represents a standard query). Field QR is 16 bits away from the beginning of 
the header and its size is 1 bit. Field OPCODE starts in the seventeenth bit and 
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occupies 4 bits. In figure Eb the textual representation of a standard DNS request 
is shown. 

It is possible to group one or more messages into one single transition. For 
example, in figure I2K it would be possible to replace the HTTP/1.1 200 with 
the grouping HTTP/1 . 1 2XX. In this case the trace would monitor the rate of all 
successful WWW operations generated by client requests (2XX) instead of only 
observing the occurrence of WWW accesses whose return code is 200 (successful 
request). Figure ^shows the representation of this grouping (lines 16-18). 
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WessagesSection 



Message “HTTP/1.1 200“ 

MessageType; server 

FieldCounter Ethernet/IP/TCP 0 HTTP/1 .1 “Protocol Version" 

FieldCounter Ethernet/IP/TCP 1 200 “Successful request" 

EndMessage 

Message “HTTP/1.1 202" 

MessageType; server 

FieldCounter Ethernet/IP/TCP 0 HTTP/1 .1 “Protocol Version" 

FieldCounter Ethernet/IP/TCP 1 202 “Request accepted but not processed" 
EndMessage 



EndMessagesSection 
GroupsSection 
Group “HTTP/1.1 2XX" 

Messages; “HTTP/1.1 200’' “HTTP/1.1 202", ... 
EndGroup 

EndGroupsSection 

StatesSection 
FinalState; idle 

State idle 
“GET" GotoState 2 
EndState idle 

State 2 

“HTTP/1.1 2XX" GotoState idle 
EndState 2 

EndStatesSection 



Fig. 5. Representation of message grouping. 



In some cases the network manager may be interested in observing the oc- 
currence of a certain string within the data field of a certain protocol, no matter 
where it is located. To do that, in the definition of such a message one must 
use NoOffset as the OffsetType parameter. This feature is interesting, for in- 
stance, to observe the attempt of an intrusion. The example presented in figure 
El defines that every TCP packet must be tested for the occurrence of the string 
/etc/passwd (line 4). 

We have also created a mechanism to allow the determination of a timeout 
to a transition to occur. To do that one must associate a timeout value (in 
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1 Message 7etc/passwd’' 

2 MessageType; client 

3 // OffsetType Encapsulation Verb 

4 NoOffset Ethernet/IP/TCP /etc/passwd 

5 EndMessage 



Fig. 6. Non-specified offset message field. 



milliseconds) to the message definition (see figure line 3). When not defined, 
a default value is used by the network monitor. 



3 The Trace Architecture 

The architecture we propose is an extension of the existing distributed man- 
agement infrastructure standardized by the IETF uni with high-layer protocol 
and network service management capabilities. Figure Q shows the main com- 
ponents of the architecture. It is composed of management stations, mid-level 
managers, programmable monitoring agents and programmable action agents. 
The following sub-sections describe the components of the architecture and their 
interactions with each other. 



Management Station Monitoring Agent 




Fig. 7. Components of the architecture. 



3.1 Management Station 

The most important activities accomplished by the network manager from a 
management station are (a) registration of mid-level managers, monitoring and 
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action agents, (b) specification of protocol traces and actions, (c) specification, 
delegation, observation and interruption of management tasks and (d) receipt 
and visualization of traps. 

As the whole architecture is based on the Script MIB, protocol traces, ac- 
tions and management tasks are scripts executed by monitoring agents, action 
agents and mid-level managers, respectively. Protocol traces are specified by the 
network manager using the PTSL language. Actions are scripts developed using 
Java or any scripting language such as Tcl and Perl. Management tasks may 
also be implemented using any language and coordinate monitoring and action 
agents. Such a script programs the monitoring agents, observes the occurrence of 
the trace and activates action agents when a condition associated to a protocol 
trace holds. The same script may also report events to the management station 
raising traps. 

At the management station the network manager can specify traces using a 
graphical tool (see an example of such a tool in figure El or, if he knows the 
language, by editing a text file. The same occurs with actions and management 
tasks. The specification of protocol traces, actions and management tasks are 
stored in the database (figure Q see flows (1, 2, 3) in diagram). When a man- 
agement task is about to be delegated, they are mapped to files and stored in 
the repository (4). 




Fig. 8. Prototype of the tool for trace specification. 



Communication between the management station and the mid-level man- 
agers takes place using the SNMP protocol (Script MIB) (5, 6). The manager 
can delegate a management task to a mid-level manager as well as abort it 
at any time. Intermediate and final results of the execution of a management 
task are stored directly at the Script MIB of the mid-level manager responsible 
for the task and can be retrieved by the management station using the SNMP 
protocol (5, 6). 
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The manager may receive traps through an element called trap notifier (21). 
When received, all traps are stored in a database (22). The traps are permanently 
retrieved by a script (3) that updates the manager’s web browser (2, 1) using 
the push technology. 

3.2 Mid-level Manager 

Mid-level managers execute and monitor management tasks delegated by the 
management station and report the most important events to it. The number of 
mid-level managers is determined by the network manager and depends on the 
size and complexity of the infrastructure to be managed. 

The process of configuring mid-level managers is the following: the network 
manager defines a management task and stores it at the repository (1, 2, 3, 4). 
Next, the activation of the task must be scheduled using the Script MIB (5, 6). 
In order to do that, the mid- level manager has to be informed about the location 
of the task (script). When activated, the task is retrieved from the repository 
using the HTTP protocol (7) and executed (8). 

The script executed by the mid-level manager installs the protocol trace (9, 
12) and the action script (17, 18), requests the monitoring agent to start ob- 
serving the occurrence of the protocol trace just installed (9, 12), polls RMON2 
variables periodically to monitor the occurrence of the trace (9, 16) and, depend- 
ing on what is observed, dispatches the execution of the action script (17, 18) 
or raises a trap to the manager (21). The script communicates with the agents 
using the SNMP protocol. 

The same script may also subscribe at monitoring and action agents to the 
traps it wants to receive. The Target MIB is used to identify the management 
task (IP address and UDP port) (9, 10). Using the Notification MIB the mid-level 
manager indicates (through filters) which traps should be sent to the manage- 
ment task, whose location was identified at the Target MIB (9, 11) fS]- When 
the script receives a trap, it may dispatch the execution of an action (17, 18) or 
correlate it with previously received traps. The result of these operations may 
be informed to the management station (21). 

3.3 Monitoring Agent 

The monitoring agents are responsible for observing the traffic on the network 
segment where they are installed. They are configured by mid-level managers and 
are called programmable because they are able to monitor protocol traces dele- 
gated dynamically by the network manager. This flexibility is obtained through 
the language presented in section 0 When the mid-level manager sets the moni- 
toring agent up (9, 12), the former defines which protocol trace it should retrieve 
(it is indicated within the script that implements the task). Once retrieved (13), 
the trace file is loaded by the monitoring engine and the observation starts (14). 

Whenever the occurrence of the trace is observed between any pair of hosts, 
information is stored within an RMON2-like MIB (15). This MIB is different 
from the standard because the protocolDir group is writable in our approach. 
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Therefore the probe stores statistics according to the protocol traces of interest 
to the network manager. Additionally, the granularity of the monitoring becomes 
higher. Instead of storing overall statistics on traffic generated by a given pro- 
tocol, statistics are generated according to the occurrence of specified traces or 
transactions. 

The alMatrix group from RMON2 MIB stores statistics on the trace when 
the latter is observed between every pair of hosts. Table Q shows the con- 
tents of the alMatrixSD table. It gathers the observed number of packets and 
octets exchanged between every pair of hosts (client /server) using the protocol 
traces being monitored by the probe. In the example, two traces were observed: 
Successful WWW access and DNS service monitoring (previously shown in 
figure Efci and b). 



Table 1. Information obtained by referring to the alMatrixSD table. 



Source Address Destination Address 


Protocol 


Packets 


Octets 


17.16.10.1 


17.16.10.2 


Successful WWW access 


254 


120.212 


17.16.10.6 


20.24.20.2 


Successful WWW access 


20 


10.543 


17.16.10.1 


17.16.10.33 


DNS service monitoring 


4 


4.350 


17.16.10.32 


17.16.10.33 


DNS service monitoring 


8 


7.300 



The disadvantage of using RMON2 MIB is that it does not have objects 
capable of storing information related to performance. For this reason, our group 
is currently considering the possibility of using, in addition to that, an RMON2 
extension, such as Application Performance Measurement MIB im. Table 0 
shows the kind of information stored by this MIB. The first line shows that 
the trace Successful WWW access has been observed 127 times between hosts 
17.16.10.12 and 17.16.10.2. Additionally, the mean response time was 6 seconds. 



Table 2. MIB with information on performance. 



Client 


Server 


Protocol 


Success. Unsuccess. Responsiv. 


17.16.10.12 


17.16.10.2 


Successful WWW access 


127 232 


6 sec. 


17.16.10.12 


20.24.20.2 


Successful WWW access 


232 112 


17 sec. 


17.16.10.1 


17.16.10.33 DNS service monitoring 


2 0 


3 sec. 



3.4 Action Agent 

Action agents reside in hosts where network services are executed. Their function 
is to perform a given operation on these services. Let us take as example the 
DNS service. Figure Eb shows a trace that enables to detect when the named 
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daemon is not in execution. There may be a management task, delegated to the 
mid-level manager, that monitors the occurrence of this trace. If that is the case, 
the action to be taken is to contact the action agent (see flow (17) in figure E|), 
which is located in the host where the DNS service is installed, and request the 
execution of a script to restart the daemon (18, 19, 20) (see the example of such 
a script developed in Perl in figure IHJ . The result obtained by the action agent is 
accessible to the mid-level manager (17, 18), which may send it to the manager 
for notification purposes (21). 



#!/usr/bin/perl 
my $pid; 

# Verify if the process named is executing, 
if (-e 7var/run/named.pid") { 

$pid = Vbin/cat A/ar/run/named.pid' ; 

} 

# If named is running, restart it using a HUP signal, otherwise 

# instantiate the process again, 
if (defined $pid) { 

print "Restarting named (sending HUP signal). ..\n"; 

Vbin/kill -HUPSpid'; 

} else { 

print "Starting named (was not running). ..\n"; 
Vusr/sbin/named &' : 

} 

# Test if the process is executing, 
if (-e "/var/run/named.pid") { 

$pid = 7bin/cat A/ar/run/named.pid' ; 
print "The named daemon is up and running as PID $pid\n"; 
} else { 

print "The named daemon could not be started!\n"; 

} 



Fig. 9. Perl script to restart the named daemon. 



4 Fault Management of Network Services 

Fault management is an important target of the proposed architecture. An exam- 
ple of fault management concerning to high-layer protocols and network services 
is checking the availability of a network service and restart it if it is not running. 
Figure E3 shows how this can be achieved using the architecture. In this case 
the task delegated to the mid-level manager is supposed to monitor the DNS 
service. 

This monitoring is performed by sniffing the packets seen in the segment. In 
a situation where the daemon responsible for the DNS service is not running, 
the agent will observe a DNS request and some time later a Port Unreachable 
ICMP message from the serving host. In this case the mid-level manager should 
contact an action agent, which resides in the DNS serving host and request 
the execution of a script to restart the daemon (such as the one illustrated in 
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Management task 

(1) Setup monitoring agent 

(2) Setup action agent 

(3) Dispatch monitoring of the trace 

(4) Monitoring bop { 

- Was the trace 'DNS service 
monitoring’' observed'^ 

- Yes (5) Dispatch action script 

(6) Trap to the management 
station 




Fig. 10. Execution steps of a management task. 



figure Ej). The textual specification of the trace DNS service monitoring is 
shown in figure m 

The architecture was designed to take into account all the standard functional 
areas of management: fault, configuration, accounting, performance and security 
(FCAPS). Our research group has explored in jI2| its characteristics to validate 
the usefulness of the architecture for the management of high-layer protocols 
and network services. 

5 Conclusions 

This work presented a distributed architecture for the management of high- 
layer protocols and network services based on programmable agents. Motivated 
by the growing need companies have to monitor high-layer protocols and their 
critical applications, the work proposes a flexible architecture able to follow 
the fast dissemination of protocols and networked applications (that need to 
be managed). The architecture may be used either in corporate networks or in 
application service providers. 

The most important contribution of the architecture is the granularity of the 
monitoring. The observation of network traffic on a transaction basis makes the 
understanding of protocol and networked application behaviors possible. The 
language proposed to specify protocol traces is simple, but the network manager 
has to know the format of the packets exchanged by the application or protocol 
to be managed. 

Another significant contribution of the architecture is the possibility to do 
more than just monitoring. Management tasks provide the manager with mecha- 
nisms to monitor the occurrence of protocol traces and to dispatch management 
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Trace “DNS service monitoring’' 

Version; 1.0 

Description; Trace to detect when named is not running 

Key; named, fault, DNS service 

Port; 

Owner; Luciano Paschoal Gaspary 

Last Update; Tue, 10 Aug 2000 15;30;58 GMT 

WessagesSection 

Message “DNS Request” 

//See code in figure 4b. 

EndMessage 

Message “ICMP Message” 

MessageType; server 

// OffsetType Encapsulation FirstBit NumberOfBits Verb Description 
BitCounter Ethernet/IP 0 8 00000011 “Type field=00000011?” 
BitCounter Ethernet/IP 8 8 00000011 “Code field=00000011?” 
EndMessage 

EndMessagesSection 

StatesSection 

//See code in figure 3b. 

EndStatesSection 

EndTrace 



Fig. 11. Trace to monitor the DNS service. 



scripts executed by programmable action agents. These mechanisms contribute 
to management automation in some scenarios. 

As described by Strau6[I2|, “distributed management in general HH| and 
the Script MIB specifically are expected to bring various advantages over the 
centralized concept suited for the raising demands in network management. A 
commonly mentioned advantage is the increased scalability due to the delega- 
tion of management tasks from the centralized network management station to 
mid-level managers. This implies that CPU and network load is also delegated 
to the subnets to which the mid-level managers belong. Another major advan- 
tage is concerned with the robustness of management tasks. While centralized 
management systems require a reliable network, the distributed approach al- 
lows to delegate some sensible manager functions next to the observed agents. 
Hence these functions may become independent from less reliable WAN links, 
for example” . 

The architecture requires more work to be controlled than a single central- 
ized management system. The management of its components becomes more 
complicated. It is necessary to distribute and update scripts, control running 
scripts, gather and correlate intermediate and final results. We believe that such 
operations, as well as the specification of traces, can be simplified by adding 
an easy-to-use interface to the management application. Currently, our research 
group is working on the improvement of the prototype. After that, a larger scale 
validation will be done. 
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Abstract. This paper presents a new scalable buffer-management scheme 
for IP Differentiated Services. The scheme consists of a Differentiated 
Random Drop (DRD) algorithm using feedback from a virtual scheduler. 
DRD choses a queue to perform an early packet drop to avoid congestion 
according to a specific probability function. It will be shown that DRD in 
conjunction with first-come first-served scheduling is able to support rel- 
ative service differentiation. The virtual scheduler is introduced to enable 
service differentiation in terms of bandwidth and delay at the same time. 
A virtual scheduler runs in parallel to the real scheduler and maintains 
virtual queue lengths that are being used by the congestion avoidance 
scheme for packet-drop decisions. Scheduling packets for transmission is 
performed by the real scheduler only. 



1 Introduction 

In the past few years many different scheduler and queue-management algo- 
rithms have been proposed. Research activities have been and still are focused on 
how to satisfy the Quality-of-Service (QoS) requirements of higher-priority flows 
while keeping fairness among classes and preventing starvation of low-priority 
traffic. 

In Weighted Fair Queueing (WFQ) schedulers such as Self-Clocked Fair 
Queueing (SCFQ) P and other rate- proportional service disciplines , queue 

weights are used to provide per-ffo'wtJ bandwidth guarantees: The link share of 
backlogged connections is proportional to the queue weight, and excess band- 
width is distributed in the same manner. Flows that are significantly below 
their reserved bandwidth share will experience less delay. This means that delay- 
sensitive services such as Voice-over-IP (VoIP) can be implemented using WFQ 

^ In general the term micro-flow is defined by the 5-tuple of source and destination 
address and port number, and the protocol number in the IP header of the packet. 
Macro-flows may consist of a large number of micro-flows forming a flow aggregate. 
For the sake of simplicity, this paper refers to flow aggregates simply as flows, and 
accordingly to a single micro-flow as a flow. 
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only in combination with a token bucket at the ingress node and that the to- 
ken bucket rate is set lower than the reserved rate. In Class-Based Queueing 
(CBQ) 0, delay differentiation can be achieved using a priority-based packet- 
scheduling algorithm for bounded traffic classes. In all these schemes, thresholds 
(for scheduler and token buckets) have to be set carefully, and often their mean- 
ing is not intuitive. Usually, this is done in a static manner when the network is 
set up, and often default parameters are not modified at all. 

This paper focuses on a new threshold-based buffer-management scheme that 
consists of a combination of Differentiated Random Drop (DRD) 0 and virtual 
scheduling. Unlike other known schemes, the proposed scheme supports simul- 
taneous bandwidth and delay differentiation and has the following advantages: 

— Dynamic drop rate adaption of traffic classes, 

— efficient and early congestion avoidance, and 

— easy setting up of thresholds. 

It will be shown that the scheme is suitable for a Diffserv [Zj enabled net- 
work, where it can be used to implement relative QoS guarantees in Assured 
Forwarding (AF) 0 per-hop-behaviors. 

The elements of active buffer management are algorithmic droppers, packet- 
marking strategies, and scheduling algorithms. Several extensions that are com- 
bined in a fiow-and-queue threshold-based buffer-management scheme use active 
buffer-management elements. These extensions are fundamental to fair packet 
dropping and better overall buffer usage. Furthermore DRD is briefly intro- 
duced as an efficient congestion avoidance algorithm. It will be shown that DRD 
in conjunction with threshold-based buffer management and simple first-come 
first-served (FCFS) scheduling is able to provide service differentiation in terms 
of dynamic and adaptive packet drop rates, which are relative to other queues in 
the system. The efficiency of a virtual scheduler, which is the key to bandwidth 
and delay differentiation, is compared to that of a simple FCFS scheduler using 
extensive simulations realized in a modified version of the network simulator 
ns 0 that was specifically extended for this purpose. 

The remainder of this paper is organized as follows. Based on the detailed 
discussion of the threshold-based buffer management in Section El the virtual 
scheduler is introduced in Section 0 A simulation model utilizes the Diffserv 
Architecture to provide insights into the combination of FCFS and DRD with a 
virtual scheduler (Sectional). Finally, Section El summarizes the advantages and 
draws conclusions. 



2 Related Work 

In [ID) Drovolis et al. propose a proportional differentiation model to refine and 
quantify relative service differentiation. Two packet schedulers that approximate 
the model are introduced and evaluated in simulations. The proportional model 
is applied on queueing-delay differentiation only and leaves the problem of cou- 
pled delay and loss differentiation for future work. 
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Fig. 1. Architecture of buffer management. 

In HH Risso discusses Decoupled Class Based Scheduling (D-CBQ), a CBQ- 
derived scheduling algorithm that uses new link-sharing guidelines to decouple 
bandwidth and delay for bounded classes. The algorithm improves delay charac- 
teristics of bounded classes compared to CBQ. Whereas setting higher priorities 
(that means lower delays) no longer leads to more bandwidth being allocated, 
incoming traffic still has to be limited by means of an additional token bucket 
filter. Note that the impact on delay and bandwidth using unbounded classes 
has not been studied in a severely overcharged network environment. 



3 Threshold-Based Buffer Management 

Figure [Dshows a flow-and-queue threshold-based buffer-management scheme |0|. 
Thresholds are assigned to flows and queues. As can be seen on the left-hand 
side, each of these flows is attributed to one queue and several flows can enter 
the same queue. In general, packets with the same QoS needs will enter the same 
queue although there may be multiple queues having approximately the same 
properties to differentiate for example between TCP and UDP traffic. On the 
right-hand side, the same flows are shown in the context of overall buffer space. 
The process of packet classification will not be discussed as it would exceed the 
scope of this paper. 

As indicated by its name, flow-and-queue threshold-based buffer management 
is a scheme primarily based on two thresholds. The first threshold limits global 
buffer occupancy of a flow and is called the per-fiow threshold. This means that 
flows exceeding their per-flow threshold undergo a special treatment such as 
marking or dropping packets. Marking and dropping depends on the type of 
buffer management and will be discussed later. Per-flow thresholds are measured 
relative to the total buffer space used. 

The second threshold is a per-queue threshold, which allows a segmentation 
of the available buffer space and is compared to the buffer space used by this 
queue. When the per-queue threshold is exceeded, packets have to be dropped to 
limit the maximum packet delay. When used without additional strategies, the 
per-queue threshold acts as a “hard” dropping policy. Packets may suddenly be 
dropped in bursts when the queue size exceeds the threshold. Clearly this behav- 
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ior is not desirable. An early dropping policy such as Random Early Detection 
(RED) ^2] or DRD should be combined with this threshold. For this purpose 
DRD will be introduced in the next Section. Later it will be shown that DRD 
outperforms traditional RED, in addition to having other useful properties. 

Using per-queue thresholds allows more than the real existing buffer space 
to be allocated to queues as opposed to hard segmented buffer spaces as used 
in m- This means that the sum of all per-queue thresholds may exceed the 
total available buffer space. The advantage of such a strategy is that it supports 
larger bursts of a flow when other flows are on a low buffer-usage level or not 
backlogged at all and, therefore, uses less global buffer space. On the other hand 
when all flows send at a peak rate, the fixed per-queue buffers cannot be fully 
exploited at the same time. At this point the per-flow threshold will act as a 
limiter. In conjunction with RED or DRD, this limit is not a hard limit and 
therefore does not cause bursty packet drops. The service rate will no longer be 
absolute but rather relative to other classes. This is even mandatory for giving 
best-effort traffic the capability of taking advantage of unused bandwidth. 

3.1 Hard Dropping Scheme 

The simplest buffer-management scheme known consists of dropping packets 
when no more buffer space is available. This strategy, commonly used in the 
past and even today, turns out to be inadequate for performing efficient and fair 
packet forwarding even when used with fair queuing. Packets are often dropped in 
“bursts” from a single flow, whereas other flows increase their traffic even more. 
As a result, fairness suffers and QoS requirements simply cannot be guaranteed. 

Adding flow-and-queue threshold-based buffer management allows buffer shar- 
ing and priority handling. In addition to being dropped when no buffer space is 
available, packets are dropped when one or both thresholds are exceeded. The 
simulations discussed in |0| show that such a simple flow-and-queue threshold- 
based buffer management is not sufficient per se to guarantee services as defined 
in Diffserv. 

In general, the thresholds used in this mechanism act as a hard limit. During 
congestion periods no indication is performed, and packets are suddenly dropped 
in large bursts once a threshold has been exceeded. There must be some addi- 
tional packet-dropping strategies to avoid bursty drops and synchronization of 
TCP sources. 

3.2 Softening Hard Limits 

One way to overcome the hard dropping nature of a threshold is to introduce 
a steadily increasing probability function depending on the average queue size. 
A linearly increasing function is used because of its simplicity: it is sufficient to 
add an additional threshold, which can be as simple as a default percentage of 
the per-flow threshold. The two thresholds together are then used as a lower and 
upper limit (max/min per-queue thresholds). 

The size of the linearly increasing section has to be set relative either to the 
global buffer space or to the allocated queue space. Too small a value does not 
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overcome the bursty drop problem and too large a size of the linearly increasing 
section will introduce premature packet drops. Without going into more details 
concerning the optimum setting, which would be beyond the scope of this paper, 
experiments have shown that a value of approximately 50% is reasonable m- 

3.3 Introducing Packet Marking 

Service differentiation within a flow can be achieved by introducing a set of drop 
precedences. A simplistic approach would be to mark all packets that have a 
higher precedence than the low default drop precedence. In doing so, almost 
exlusively packets having a higher drop precedence will be dropped, and differ- 
entiation between more than two drop precedences will no longer be feasible. 
Therefore, the proposition is to assign per-ffow thresholds to drop precedences, 
and each drop precedence has its own per-ffow threshold. Multiple flows with 
different per-ffow thresholds may coexist in the same queue. Marked packets in a 
queue may belong to different flows, and no distinction according to the initially 
given drop precedence is done. The marking is used when drop decisions have to 
be made. The per-ffow drop rate increases with decreasing per-ffow threshold. 
Packet marking can be regarded as a “previous conviction” with a scope that is 
strictly limited to the actual router. 

If per-ffow thresholds are used to trigger packet drops, a small per-ffow thresh- 
old will completely prevent a flow from obtaining any service when network traffic 
is high. A part of the global buffer space remains unused because it is reserved 
for other traffic classes that perhaps will not occupy this space in the near fu- 
ture. One way to improve buffer usage is to increase the per-ffow threshold but 
then service differentiation becomes more difficult because these thresholds move 
closer together. Nevertheless, an arriving packet belonging to such a flow could 
be enqueued and marked. If packets need to be dropped later, those marked 
should be dropped first. As a result buffer space is used more efficiently, and 
overall more packets are served. A sophisticated packet-dropping scheme can 
take into account the per-queue buffer usage as well as a relative queue priority, 
and then select a packet to be dropped. 

3.4 Algorithmic Dropper and Congestion Avoidance Scheme 

Upon packet arrival, the algorithmic dropper examines wether the packet should 
be enqueued and an additional action taken. An additional action is an action 
that tries to prevent congestion in the network. Figure El illustrates the algo- 
rithmic dropper that consists of three parts: The first is used for congestion 
avoidance and evaluates whether an existing packet has to be dropped in one of 
the queues by choosing a queue randomly. If yes, a packet drop in this queue is 
triggered. The second part consists of hard dropping limits (tail-drop) for queue 
and buffer overflows. The congestion avoidance scheme should drop packets ear- 
lier so that tail-drops occur only rarely. In the third part, packet marking to 
implement drop precedences in a queue occurs. 

The congestion-avoidance block uses DRD to evaluate potential packet drops. 
The main goal of the DRD scheme is to introduce a dynamic per-queue drop 
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Fig. 2. Flow-chart diagram of an algorithmic dropper. 



probability while adding relative dependency among the various queues in the 
system. This will primarily allow service differentiation such as “better than” 
another class. Each time a packet arrives, the following congestion-avoidance 
mechanism is performed before processing of that packet continues. Using a 
dynamic per-queue probability, one of the queues is chosen randomly and random 
early discard is then performed in this queue. The per-queue probability pi is 
evaluated as follows: Every queue is assigned a fixed priority equal to the queue 
number i. Thus the queues are sorted according to their priority. The per-queue 
probability is proportional to the number of bytes in the queue plus the number 
of bytes in all higher-priority queues. This is a more general approach than is 
used for RIO in HD. Clearly queues containing no packets have zero per-queue 
probability. Note that priorities are introduced only for dropping behavior and 
not, as for example in CBQ 0, as a per-queue priority used for scheduling 
purposes. In addition, higher priority does not imply lower packet delay. The 
per-queue probability pi can be written as 



P^ = 



CELih if 6,^0 
0 if = 0 ’ 



( 1 ) 



where bk is the number of bytes in queue k. For N queues the normalization 
that leads to the constant C is then given by 



N 

Y^p, = i^c = 

2=1 



1 






E 



J 



bk 



( 2 ) 



Finally, if a packet in that queue has to be dropped, by preference a marked 
one will be chosen. 
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Fig. 3. Architecture of buffer management. 



4 Virtual Scheduler 

The above-mentioned scheme is able to support relative service differentiation 
even with a simple FCFS scheduler Service classes in terms of “better than” 
can be implemented, and differentiation is expressed in lower drop rates for 
lower-numbered (higher-priority) queues. Whereas the scheme is able to support 
minimum bandwidth guarantees and fair excess bandwidth allocation, it fails 
in differentiating packet delays due to the simple FCFS scheduler. The idea is 
to combine two schedulers while keeping their advantages: The first scheduler 
will maintain fair packet scheduling and enable delay differentiation. For this a 
WFQ scheduler can be used. The second scheduler will be responsible for early 
congestion avoidance and will start dropping packets if necessary. It maintains 
virtual queue lengths used by the congestion avoidance scheme as a feedback. 
This scheduler is called virtual because it does not directly influence the depar- 
ture time of packets in the buffer. Its result, the virtual queue lengths, are only 
used by the algorithmic dropper to perform drop decisions. 

Figure0illustrates the architecture of the buffer-management scheme using a 
virtual scheduler. As basis the scheme of Section0has been taken and enhanced 
to support a virtual scheduler. The buffer is divided into several queues. The 
number of queues is configurable, and the queues are served in a WFQ manner. 
For each queue several parameters are given (queue number, max/min per-queue 
threshold and queue weight). In contrast to other schemes, these parameters 
are fixed at the beginning once and for all, and no tuning is required later. In 
addition, the parameters in the queues with relative delay differentiation are 
equal (queues 2 to 4), thus making configuration easy. In Section 0 it will be 
shown that such a scheme is capable of providing a dedicated service class to a 
given queue. When a packet arrives, it will go through the algorithmic dropper 
with the only modification that virtual queue lengths are used rather than the 
real ones to trigger a packet drop. Meanwhile all packets are also served by the 
virtual FCFS scheduler. 



A BufFer-Management Scheme for Bandwidth and Delay Differentiation 225 



Buffer management becomes quite difficult because the two schedulers serve 
packets simultaneously. Therefore, a special packet tag that is attributed to each 
of the packets and contains all necessary information has been introduced. In 
other words, if a packet has been treated by the real scheduler (and therefore has 
been sent to the outgoing link) but not yet being served by the virtual scheduler, 
the packet tag will remain stored in memory while the space used for the real 
packet can be freed. If a packet has been treated first by the virtual scheduler 
but not yet by the real scheduler, the packet tag and the packet itself will remain 
stored in memory until the packet has been served by the real scheduler. It is 
clear that by introducing packet tags, which can remain in memory longer than 
a packet’s lifetime, overall memory usage will increase. Section El discusses this 
issue, and shows that the increase in memory is limited. 

4.1 Parameter-Setting Guidelines 

The following guidelines should help set the parameters of a flow-and-queue 
threshold-based buffer-management scheme with N queues: 

— The queue weight Wi corresponds to the minimum bandwidth guarantee for 
the service class in queue i and is a part of Service Level Agreements (SLA) . 

— For equal per-queue threshold settings, lower-delay classes are in higher- 
numbered queues. 

— Delay-insensitive queues get a high per-queue threshold to profit from unused 
buffer space. 

— For classes that contain adaptive flows, the minimum per-queue threshold 
is set to half the maximum per-queue threshold. Non-adaptive flows do not 
react to packet drops and have equal min/max per-queue thresholds. 

— Drop precedences can be set equal over a set of queues. The higher the per- 
flow threshold, the lower the drop probability. Setting the per-flow threshold 
to 1 disables packet marking for this flow, but packets can still be dropped 
in a severely overloaded network. 



5 Simulation Results 

The simulations described here have been made in a Diffserv-enabled network 
environment. Flow-and-queue threshold-based buffer management maps well to 
the Diffserv classes jiSI 1 for the following reasons: 

Number of queues: For a system to scale well, the number of queues is important. 
It is clear that the flow-and-queue threshold-based buffer management does not 
treat micro-flows individually, but only applies the service defined for the cor- 
responding service class. Thus, the buffer-management system keeps the queue 
number low, in a Diffserv environment generally not more than several dozens. 
Packet classification is based on the Diffserv Codepoint, but other classification 
rules could also be envisaged. 
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Fig. 4. Simulation topology. Fig. 5. Per-flow and per-queue thresholds. 



Packet delay: Setting a low per-queue threshold assures low packet delay, and 
bandwidth is guaranteed by the queue weight of the WFQ scheduler. This can 
be used to implement Expedited Forwarding (EF) per-hop-behavior. 

Relative service classes: Using a virtual scheduler, service classes in terms of 
“better than” or Olympic service 0, which consists of three service classes, 
namely gold, silver and bronze, can be implemented. The Diffserv AF per-hop- 
behavior can be used to identify the service class of a packet. 

Support of drop precedences: Various packet markers have been proposed in the 
Diffserv working group nw7i - These markers use the result of a traffic meter 
to set the appropriate Diffserv Codepoint (DSCP). They should not be confused 
with packet marking as introduced in this paper. The marking strategy proposed 
here differs from these Diffserv markers because it acts only locally in a router. 
Per-flow thresholds are assigned to Diffserv drop precedences in AF to fulfill 
dropping differentiation. Although packets are only either marked or not, this 
is sufficient to support multiple levels of drop precedences. The DSCP is not 
modified in the process, but can influence the marking done by the scheme. 

5.1 Service Differentiation for AF without Virtual Scheduler 

In this Section FCFS and SCFQ schedulers without a virtual scheduler are com- 
pared. These two scheduler types have been chosen to discuss their main prop- 
erties when combined with DRD and to show that these properties cannot be 
maintained at the same time. The topology of the simulation is shown in Fig- 
ure^ where multiple sources as given in Table Eshare the same outgoing link at 
a router. The router uses the DRD scheme as explained in Section ^3 The first 
queue is assigned to an EF Diffserv class. Ten Telnet applications generate the 
traffic for this flow. This traffic is substantially lower than the reserved rate, and 
other flows may borrow from this unused bandwidth. The following three queues 
treat three AF Diffserv classes: AFl, AF2 and AF3. These flows are generated 
by CBR and multiple Pareto on/off sources, which create an equal number of 
all three drop precedences in each AF class. The sources have been chosen such 
as to be extremely bursty. The average sending rates for all three AF Diffserv 
sources are equal and vary from 50 to 150% of the allocated bandwidth. The 
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highest-numbered queue is designated for adaptive best-effort (BE) traffic. A set 
of greedy TCP connections generates this traffic. All queues have equal weights 
and, therefore, equal reserved bandwidth. All links are set to 10 Mbit/s. The 
maximum buffer space is set to 160 kBytes. During the simulation, all sources 
are sending data at the rates given in Table Q] 



Table 1. Traffic sources. 



Flow 


Sources 


Rate [% of reserved rate] 


EF 


Telnet Sources 




AFly 


CBR & Pareto On/Off 


from 50% to 150% 


AF2y 


CBR & Pareto On/Off 


from 50% to 150% 


AF3y 


CBR & Pareto On/Off 


from 50% to 150% 


BE 


greedy TCP sources 





The buffer settings are shown in Table |2| and Figured! The per-queue thresh- 
olds are set to guarantee a maximum delay for each class. The terrassing of the 
per-ffow thresholds in an AF class is important to realize drop precedences. The 
thresholds AFxl, AFx2 and AFx3 are the same for all AF queues. When set- 
ting up the thresholds, no differentiation among the same drop precedence of 
different AF classes has to be performed. All AF classes start dropping packets 
at the same per-queue limit. The best-effort RED threshold is set to 40% of its 
per-queue threshold to enable early congestion avoidance even when the buffer 
space has been almost completely filled up by other sources. 



Table 2. Buffer thresholds. 



Thresholds 


EF 


AFxl 


AFx2 


AFx3 


BE 


Per-Flow 


1.0 


1.0 


0.8 


0.6 


0.4 


Per-Queue 


0.2 




0.4 




1.0 



The main goal of introducing packet marking as mentioned in Section 1.4 ..SI 
is to support service differentiation in the form of AF drop precedences and to 
improve overall buffer usage. Packet marking does not influence packet order, 
and packets belonging to the same traffic class will leave the router in the same 
sequence as they arrived. 

The results shown in Figure El illustrate the differentiation among AF classes 
when FCFS or SCFQ scheduling is used. With SCFQ the scheduler completely 
dominates bandwidth allocation. Minimum-bandwidth guarantees for best-effort 
traffic can be given with both schedulers if packet marking is used. Without 
packet marking, best-effort traffic starts oscillating and loses reserved bandwidth 
even with SCFQ scheduling. 
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Fig. 6. Comparing AF and BE bandwidth for FCFS and SCFQ scheduling. 




60 80 100 120 140 



AF Link Usage [%] 



Fig. 7. Packet drop rate for various traffic classes using a FCFS scheduler. 



With the given per-flow thresholds, every AF class is split into three drop 
precedences (Figure 0. Because of the high network load (when AF classes are 
sending more than 120% of the reserved rate), the buffer space of a router is 
almost completely filled up at any time, and the third per-flow threshold is too 
low to take effect. Nevertheless the Diffserv requirement of having at least two 
drop levels is satisfied. Relative service differentiation in terms of packet drop 
rates is clearly visible. 

The packet delays are shown in Figure |S1 For scheduling, SCFQ scheduler 
takes the packet arrival time as well as the number of packets being stored in a 
queue into account, whereas in a FCFS scheduler all queues experience the same 
average delay because FCFS cannot distinguish among the queues. Therefore 
with FCFS scheduling, the average delays for traffic classes other than best- 
effort are shifted towards the best-effort values when the actual AF bandwidth is 
lower than the reserved rate. However, delays have an upper bound given by the 
per-queue thresholds. This is not the case for FCFS scheduling, and only overall 
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buffer occupancy influences packet delay. To be more precise, decreasing per- 
queue thresholds would lower overall buffer usage because packets have already 
been dropped earlier to avoid congestion, and would have an equal effect on 
packet delays in all queues. On the other hand, we have seen that a WFQ 
scheduler imposes its fairness properties in a way that traffic differentiation is 
only feasible through static threshold settings. Although the average delay can 
be kept within an acceptable range, no significant delay differentiation can be 
realized with FCFS. Non-best-effort delays are always larger with FCFS for 
sources using less than their full share of bandwidth UBI. Here packet delay 
could be improved by using a virtual scheduling algorithm or a “weak” WFQ 
scheduler that allows higher-priority packets to bypass others. 

5.2 AF: Gold, Silver and Bronze Services Using a Virtual Scheduler 

To facilitate comparison of the results shown above using one scheduler with 
those described below, the same settings have been used but a virtual sched- 
uler has been added. Again, the incoming traffic for the AF classes stems from 
CBR and multiple Pareto on/off sources. The AF sending rate range has been 
increased to 200% of the reserved rate to show that even with severe oversub- 
scription tail drops are rare. 

The above-described results show how packet drop rate differentiation can 
be achieved with FCFS while packet delay remains the same for all packets 
traversing the router. The virtual scheduler scheme has been introduced to over- 
come this weakness. Figure 0 shows the average packet delays for all queues. 
Surprisingly, what was better in terms of drop precedence in simple DRD now 
becomes worse in terms of delay: This means that the packet delay is shorter 
in higher- numbered AF classes and therefore AF3x has the best performance 
in terms of delay. As AF itself does not specify any particular relationship be- 
tween AF per-hop-behaviors, the AF numbering introduced earlier will be kept. 
In addition it can be seen that the delay is bounded for each class separately. A 
intuitive explanation of this result is that DRD combined with a virtual sched- 
uler will start dropping packets earlier in higher-numbered AF classes because 
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Fig. 8. Comparing packet delays without Fig. 9. Packet delays for different classes 
virtual scheduler. using a virtual scheduler. 
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Fig. 10. Packet drop rate for all AF classes including the drop precedence. 



of the higher DRD drop probability of those classes, whereas the real scheduler 
maintains the fair bandwidth allocation and therefore assures equal drop rates 
for equal incoming traffic. 

In contrast to the simple DRD case, with FCFS packet drop rates are similar 
for all AF classes. Again only two levels of drop precedence in one class are 
visible (Figure COl). It will be shown later that this is only the case when all 
classes send at the same rate. 

The results show that the virtual scheduler scheme is able to differentiate 
packet delays while giving a strict bandwidth guarantee according to the config- 
ured weight. Excess bandwidth is distributed according to the queue weights. 

5.3 Delay Distribution 

Packet delays are distributed approximately normally, as shown in Figures cn 
and The latter is a quantile-to-quantile plot, in which the straight line 
indicates a linear least-squares fit. The slightly S-shaped plots indicate that the 
distribution is peakier and has shorter tails than a normal distribution. This 
stems from the fact that delays cannot be negative and that overall buffer space 
is limited. The delay differentiation is due to the intrinsic behavior of the scheme 
rather than to sudden queue flushes or other undesired effects. In addition to 
the lower delay, the AF3y class also has a smaller delay variance, making it an 
attractive candidate for a “better than” service class. 

5.4 Varying Incoming Traffic for One AF Class 

In the preceding simulation, the AF sending rate has been varied for all AF 
classes. Now the rate is fixed to 100% of the reserved rate, and only one class at 
a time varies from 50% to 200%. 
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Fig. 13. Packet delay and drop rate while only one AF class varies. 



Figure show the packet delay and the drop rate when only one incoming 
rate (AFly or AF3y) varies and confirms the results obtained in Section lOl In 
addition, a clear drop precedence differentiation between all three precedences, 
which in the previous simulation has disappeared, is now distinguishable again. 
This lets suggests that delay depends on the number of queues in the system. 
This has not been tested in this paper, and is left for future work. 

5.5 Comparison to the Basic RED Algorithm 

In a network environment with severe oversubscription and thus offered loads 
that by far exceed the transmission capacity, RED has turned out to be insuffi- 
cient for efficient congestion indication if the number of TCP connections is high 
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Fig. 14. Comparing packet tags for different AF under heavy load. 



or traffic does not behave in a TCP-friendly way. Here we compare DRD conges- 
tion avoidance using a virtual scheduler with traditional RED. The total offered 
load for the simulation has been set to 120% of the available rate. Packet drops 
are counted during the 100 sec of simulation time. Table El shows that forced 
packet drops, known as tail drops, have been significantly reduced using DRD 
and virtual scheduler, and amount to less than 2% of all dropped packets. RED 
drops more than two thirds of all dropped packets because of buffer overflow. 
The conclusion is that DRD with virtual scheduling has a excellent potential for 
efficient, early congestion avoidance. 



Table 3. RED vs. DRD with virtual scheduler. 



Tot. offered Rate 


Early drops 


Tail drops 


RED 11.98 Mbit/s 


44,639 


92,459 


DRD 11.71 Mbit/s 


120,074 


2,391 
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5.6 Packet Tags 

As already mentioned, the new scheme needs more memory, mainly because 
additional packet tags have to be stored. First of all, it has to be shown that the 
number of additional tags is bounded. The set of tags in a queue i is given as 
Ti, and the subsets of real and virtual tags are and . The use of a second 
scheduler leads to an overall increase of packet tags in the system. The set of 
extra tags is written as Tf = T" \ (T/' fl T^). If only a real scheduler is used 
then Ti = , otherwise, i.e. with a virtual scheduler, Ti = T[ U . Figure fPil 

shows that |T^fi \ ^afiI ^ and under heavy load in the AFl 

queue, whereas in the AF3 queue |T^f 3 \ ^ '^afs ^ '^afs- ^ke 

consequence is that for the former the set of extra tags is |T^fiI ^ 
the latter IT^fsI I'^afsI 0, causing the increase in the total of packet tags. 
We found that 2 |T’’| > |T| holds for all offered loads. As compared to a packet 
these tags are small, the impact on overall memory increase is justifiable. 

6 Conclusion 

In this paper we introduced a two-threshold-based buffer-management system 
that can be used for relative service differentiation in AF per-hop behaviors. 
The main new parts are the DRD congestion-avoidance scheme, internal packet 
marking, and a virtual scheduler. The DRD congestion avoidance scheme enables 
dynamic and relative service differentiation even with a simple scheduler such as 
FCFS. The fact that no delay differentiation is possible when used with FCFS 
led to the introduction of a virtual scheduler scheme. By means of simulations, 
it has been shown that a virtual scheduler is a robust management scheme for 
heavy and bursty traffic load. In conjunction with DRD, the scheme is able 
to perform relative delay differentiation of AF Diffserv per-hop behavior while 
guaranteeing minimum bandwidth and fair excess bandwidth allocation. The 
scheme avoids tail drops and, therefore, does not lead to TCP synchronization 
effects. Compared to other schemes, DRD with a virtual scheduler uses only 
few parameters (per-queue and per-flow thresholds, queue priority and queue 
weight) that are set at initialization time, and then requires no further tuning. 

Packet marking is an important enhancement to flow-and-queue threshold- 
based buffer-management systems which allows the implementation of at least 
two drop precedences in a queue. In addition to optimizing overall buffer usage, 
packet marking is even necessary to avoid bursty packet drops. The influence 
of responsive and non-responsive flows in the same queue can have a significant 
impact on inter-flow fairness, but would exceed the scope of this paper and is 
left for future work. 
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Abstract. Proposed as a scalable solution to provide Quality of Service (QoS) 
networks. Differentiated Services (DiffServ) architecture enables service 
providers to offer each customer a range of services that are differentiated on 
the basis of performance. However, up to now little work discusses how to 
provide effective communication support for multimedia applications over the 
differentiated services network. In this paper we present a new enhanced 
communication approach at the end system to support distributed multimedia 
applications. Two new mechanisms are highlighted in our approach: enhanced 
communication stack to support differentiation within one flow, and the 
network-awareness provision for applications at the end system. Our approach 
improves the capability of both error resilience and flexible rate control for 
transmitting compressed multimedia bitstreams, particularly those using 
scalable coding technologies. We also develop an object-based video 
transmission system as an application instance to take advantage of the 
enhanced communication approach. Experimental results demonstrate the 
effectiveness of our proposed methods. 



1. Introduction 

Current Internet provides best-effort (BE) service to end-users and does not make any 
service quality commitment. However, most multimedia applications are sensitive to 
available bandwidth and delay experienced in the network. To satisfy these 
requirements, two frameworks have been proposed by IETF: the Integrated Services 
(IntServ) [1, 6], and the Differentiated Services (DiffServ) [10]. 

The IntServ model provides per-flow QoS guarantee and RSVP (Resource 
Reservation Protocol) [5] is suggested for resource allocation and admission control. 
However, the processing load is too heavy for backbone routers to maintain states of 
thousands of flows. DiffServ model is designed to scale to large networks and gives a 
class-based solution to support relative QoS. The main idea of DiffServ is to 
minimize state and per-flow information in core routers by placing all packets in 



* Corresponding author. E-Mail: hswu@public.bjnet.edu.cn Tel: (86-10) 62785814-525; Eax: 
(86-10) 62785933. 

Mail address: Room 224 Central Building, Tsinghua University, Beijing 100084 P. R. China. 

P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 235-244, 2001 
© Springer-Verlag Berlin Heidelberg 2001 



236 H. Wu, H.-R. Shao, and X. Li 



fairly broad classes at the edge of network. Core devices perform differentiated 
aggregate treatment of these classes based on the marking performed hy the edge 
devices. A single byte in each packet is used to do this, called the DS byte (the Type 
of Service byte in IPv4 and the Traffic Class byte in IPv6), which can be set by the 
end station or ingress edge router to indicate the class of service desired. Since it is 
highly scalable and relatively simple, DiffServ model may he promising to dominate 
the backbone of the next generation Internet in the near future. 

One of the most important research topics in multimedia networking is how 
various applications such as multimedia streaming and video conference can take full 
advantage of differentiation capability provided by DiffServ. Inherently, different 
kinds of information in the compressed bitstream may have different importance 
levels for the decoder to reconstruct the multimedia playback data. For instance, in 
MPEG4 shape and motion information is more important than texture for a P frame 
[15]. If shape and motion information is lost during transmission, the decoder cannot 
reconstruct the P frame successfully. However, if partial texture information is lost 
without the loss of shape and motion, it is still possible to reconstruct the P frame with 
acceptable quality [11, 15]. Another example is scalable coding such as layered 
approach [2, 4, 8] or Fine Granularity Scalability [14], in which enhancement layers 
are much less important than the base layer information. It seems that there is a 
natural mapping between different kinds of information with different classes of 
network packets, but unfortunately today's differentiation approaches can only 
support differentiation among flows, i.e., all packets belonging to the same flow have 
to be mapped to the same packet class. To satisfy the requirements of scalable 
multimedia, we propose an enhanced communication stack that can provide 
differentiation within one flow. 

Currently, layered transmission and multicasting approach for scalable media can 
also differentiate base layers and enhancement layers by using multiple IP sessions. 
However, it is complicated for network and end system to maintain multiple sessions 
semantically bundled. Particularly, it is difficult to control the synchronization 
between these semantically bundled layers [2, 15]. On the other hand, with the 
functionality of differentiation within one flow, multiple-layer information can be 
transmitted within one IP session. Moreover, different kinds of information within 
one layer can also be easily differentiated in our approach. 

Network-awareness support at the end system is another key issue for multimedia 
transmission over DiffServ network. With the development of network technologies 
and users' requirements, today's multimedia applications (such as MPEG4 
applications) become more and more complicated. It is required that resources should 
be allocated and managed in a cooperative way, which means network resources can 
be dynamically coordinated among applications and flows within the applications 
adaptive to network fluctuations. We propose network-awareness support at the end- 
system to solve this problem successfully. 

The rest of the paper is organized as follows. Section 2 presents the mechanism of 
differentiation within one flow. Section 3 discusses the network-aware support at the 
end system. In Section 4, we describe the experiments and the performance testing 
results of our approach. Section 5 introduces an application instance running on the 
proposed communication stack. Einally we conclude this paper. 
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2. Differentiation within One Flow 

2.1. Design of the Enhanced Communication Stack 

For complex multimedia applications such as MPEG4 programs, usually there are 
multiple objects of the same media or different media [15]. If these objects have 
different QoS requirements, generally each object is served by an individual IP 
session or even several sessions in the case of multiple layers. In general it is difficult 
for the end-system and network to maintain so many sessions for one application. By 
default, packets are marked based on a mapping from the service type associated with 
a flow, and all packets within one flow have the same marking value. However, if 
differential marking within a flow is supported, layers belong to the same object or 
different objects can be multiplexed into one IP session. 

A possible problem resulting from differentiation within one flow is disorder of 
packets within the same flow, but this exists even without the function of 
differentiation within one flow because of the connectionless characteristic of IP 
protocol. To support the new differentiation functionality, an extension needs to be 
made on the protocol stack of the end-system. 




Fig. 1. Logical Architecture of the enhanced communication stack 

We propose a new marker mapping mechanism in the host protocol stack to 
support differentiation within one flow. We introduce the multiple queue mechanism 
at the end-system and each queue buffers packets with a particular priority. When the 
IP header is added at the IP layer, the priority is mapped to the DSCP (DiffServ Code 
Point) byte. 

To achieve our proposal, we intend to find the different priorities of packets 
according to DSCP value marked by QoS-aware applications, and then packets are 
classified into different classes according to specific rule. All this can only be 
performed through TCP/IP stack within operating system. Application should call the 
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kernel to set the DSCP value in IP packets header, and the kernel should finish all 
following queuing and scheduling. 

After kernel sorts the packets generated hy QoS-enabled applications into different 
classes according to their DSCP value, DiffServ scheduler in operating system sends 
out different classes of packets according to certain scheduling algorithm. Thus 
Priority Class Queues are used to implement packets scheduling. 

The logical architecture of proposed system is shown in figure 1. It shows a host 
system, including applications, operating system and hardware (network adapter). In 
the operating system, the TCP/IP stack will perform the mechanism proposed and 
send the resulting packets into network adapter. It can be seen from Figure 1 that the 
data path and the control path are separated, and the control capability is improved. 



2.2. Implementation of the Enhanced Communication Stack 

We implemented our enhanced communication stack based on the open source code 
of Microsoft Research IPv6 stack release 1.4 (MSRIPv6 release 1.4). 

Network protocols in Windows NT are dynamically loadable device drivers, much 
Imike any other device driver in Windows NT. It is possible to add a new protocol to 
the system by writing two new components: a kernel-level driver (tcpip6.sys) that 
exports the TDI interface and uses the NDIS interface, and a user-level helper 
(wship6.dll) to support access to the driver via sockets [7]. 

We made needed modification to MSRIPv6, most of which is done within these 
two components. Our main purpose is to add modules to set Traffic Class Value in 
IPv6 packet header and to queuing the packets according to their TC values. As a 
prototype, this would not add to the overhead of kernel. 



3. Network-Aware End System 

Figure 2 shows a framework example of the streaming server with intelligent resource 
control and management for multimedia applications. This framework considers the 
transmission of multiple-object video programs and other types of media such as 
audio and data. Each video object is compressed first and corresponding elementary 
stream is generated. Then information within each elementary stream is classified 
based on importance and assembled into packets with different DiffServ classes. 
Network Monitor is responsible for estimating the available network bandwidth 
dynamically through probing or feedback-based approach. Packet Forwarder forwards 
the packets to the network. We do not discuss these two blocks in details in this paper. 
The other functional components are described as follows. 

• Priority Mapping and Marking Agent 

This component is responsible for the interaction between applications and the 
DiffServ networks. It assigns DSCP marks to packets and maps them to the 
corresponding DiffServ classes. 

• Application Collaborator 

The Application Collaborator is responsible for resource coordination among 
multiple objects within one application and among multiple applications. It receives 
information from Application Profiles, Remote Users Interactions, and Network 
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Monitor to make the decision. In addition, the Application Collaborator tells how to 
map packet priorities from individual encoders into network classes. The receivers 
can interact with the server through user-level signaling. 

• Application Profiles 

This component records the semantic information of the applications such as which 
media and flows are included in an application and their relative importance levels. 

• Remote User Interactions: 

A user can interact with the video player or the server in several ways such as 
mouse clicking, mouse moving, fast forward, fast backward, object zoom-in, object 
zoom-out, add or delete. Some of these interactivity behaviors require dynamic 
adaptation of the bit rate of each video object and dynamic resource allocation 
coordination among multiple video objects. In object-based video multicast 
applications, different clients can have different views and interactions for the same 
video. 




Fig. 2. Multimedia Communication Framework in the End-system 



4. Performance Testing of the Enhanced Communication Stack 

4.1. Testing Environment 

The experimental testbed is shown in figure 3, which is an IPv6 testbed. 

Box I is a PC installed with Win2000 operating system and MSRIPv6 stack. The 
others are installed with Linux (Redhat) and UNIX (FreeBSD). All the boxes are 
dual-stack, i.e., IPv4 and IPv6 stacks. The “Internet” in the figure is a network 
comprised of several PCs, routers and switches. Box 4 is the default gateway of box 
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1, box 2 and box3. Box 5 is connected to box 4 via IPv6 over IPv4 tunnel, which is 
virtual link and different from physical links among the other boxes. 




Fig. 3. Testing Environment 

Box 1 is the sending machine and box 5 is the receiving machine in the testing. As 
we made our modification in sending module and leave the receiving module 
untouched, we only need to testify the sending functions, and the box 5 with UNIX 
installed should not affect the testing results. 

With the original MSRlPv6 stack we are not able to set the value of TC field, but 
the TC value can be set with the modified stack. Thus the testing can be performed 
under four conditions. The composition is shown in table 1. 



Table 1. Testing Condition 





Not setting TC (So TC=0) 


Setting TC Randomly 


Original Stack 


•Condition • 


Not available 


Modified Stack 


•Condition • 


•Condition • 



We define the condition 1 as “not setting TC value with the original stack”, and 
condition 2 as “not setting TC value with modified stack”, and so on. In the following 
paragraphs, we use “test 1” to represent “test carried out under condition 1”, and it is 
the same with “test 1” and “test 3”. 

In the testing procedure, the sender, box 1 sends packets to the receiver, box 5 
consecutively. Before sending, each packet’s TC field is marked with one number 
randomly chosen, and the packet content is filled with the same value as TC field in 
its header. 



4.2. Experiment 1: Delay 

One considered aspect is delay. We perform testing under the three conditions 
respectively, consecutively sending 40 packets to see the delay. The result is shown in 
figure 4 (left). At first the network link is with relatively light load. We find that the 
delay in test 1, 2 and 3 has no radical difference, and the delay time of each packet is 
around 1 millisecond except for a few bizarre points, which is presented in the curves. 
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Fig. 4. Results of Delay Experimenting (Left: delay testing result without congestion Right: 
delay testing result in congestion condition) 

Then we perform the same testing with heavy link load. Several other services 
assumed to consume much link resource such as FTP are started to create the link 
congestion. The testing result is shown in figure 4 (right). This time the delay of each 
packet and the variation within each condition are apparently much larger than before. 
But in the results we still cannot find the distinction among testl, test2 and testS. 



4.3. Experiment 2: Packet Loss 

The other aspect we consider is packet loss. The first problem that we care is whether 
adding queues in the stack to schedule different classes of packets would have 
negative effect on the performance of the stack. As described before, when 
applications do not set TC value, they leave this field to be zero, which is the default 
value in the original stack. Thus in case of not setting TC value, i.e., in condition 1 
and condition 2 respectively, it would be more reasonable for us to compare results 
with different IPv6 stacks. 

We create congestion with some traffic such as FTP to consume link resource. 
Some packet losses do occur, but very tiny, for our methods in testing environment is 
not enough to simulate the complex conditions of WAN. Both in test 1 and test 2 we 
tried 10 times to count packet loss in each time. The result is shown in figure 5 (left). 

As the figure tells, there is no much difference in the results of test 1 and test 2, 
which may mean that queues added into the stack for class- scheduling do not impact 
too much negative effect on performance of the stack. But the testing environment 
and methods are too simple, and we cannot control sending speed of either the 
applications or the hardware (the network card). So the result in this figure does not 
mean that our modification would not affect the overall performance negatively in 
other conditions. 
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Packet Qass 



Fig. 5. Results of Packet Loss Experiment (Left: packet loss of different stacks Right: packet 
loss of different classes) 



The other problem that we care is whether our mechanism can schedule different 
classes correctly, especially in congestion. We need to prove that with the scheduling 
mechanism, in congestion conditions packets with higher priorities can be sent out 
first. We performed two times of testing under condition 3. 100 Packets with TC 
value randomly set are sent out from box 1 to box 5 in a network link with very heavy 
load, and in box 5 the packet loss of each class is calculated. In this testing, we 
defined four classes; in which class 4 has the highest priority while class 1 has the 
lowest priority. Results of two times of testing are shown in figure 5 (right), from 
which we can see that packet losses of class 3 and class 4 are lower than those of class 
1 and class 2. In the fist time of testing, packet loss of class 1 is 6% and that of class 2 
is 5%, but packet losses of class 3 and 4 are both 2 %. In the second time, packet 
losses of class 1 and 2 are both 5%, but that of class 3 is 2% and 1% of class 1. This 
proves that the scheduling mechanism is effective in packet scheduling. 



5. Application Instance Running 

on the Enhanced Communication System 



It can be seen from Section 4 that the enhanced communication stack can differentiate 
different classes of packets without much performance impairment. However, the 
enhanced stack can greatly benefit the transmission of scalable multimedia. 

We implemented a MPEG4 video streaming system and run it on the proposed 
enhanced communication stack within a simulated DiffServ network. In this system, 
we proposed a new bitstream classification, prioritization, and packetization scheme 
in which different types of data such as shape, motion, and texture are re-assembled, 
assigned to different priority classes, and packetized into different classes of network 
packets provided by DiffServ. Taking advantage of the enhanced communication 
stack, our scheme distinguishes not only different kinds of frames, but also different 
types of information within the same frame. Readers can refer [15] to obtain details of 
our new transmission scheme. Besides our scalable transmission approach, for the 
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sake of comparison we also implemented the traditional approach in which the 
bitstream is packetized with no information re-organization/prioritization and all 
packets have a fixed size (hOObytes). Figure 6 and 7 show the comparison results 
example of our approach and the traditional approach for Bream. It can be seen that 
our proposed transmission approach based on the enhanced communication is much 
better than the traditional one. 
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Fig. 6. Video quality comparison for Bream at 1 1.7% packet loss rate (actual bit rate= 168kbps; 
original bit rate=l 87kbps). 




Fig. 7. Video frame example (Number: 150) Left: traditional approach Right: the proposed 
scheme. 



6. Conclusion 

In this paper we presented a new enhanced communication approach at the end 
system to support distributed multimedia applications. Two new mechanisms were 
highlighted in our approach: enhanced communication stack to support differentiation 
within one flow, and the network-awareness provision for applications at the end 
system. Our approach improves the capability of both error resilience and flexible rate 
control for transmitting compressed multimedia bitstreams, particularly those using 
scalable coding technologies. We have implemented a prototype and used an 
experimental testbed to study the effect of our approach. Experimental results 
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demonstrate that our approach can be an effective means for packet classification and 
scheduling for multimedia transmission. 
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Abstract. This paper presents investigations into the three kinds of 
per-hop behaviors of Diffserv networks. The investigations were carried 
out using Queen’s University IP Simulator v2.0 (QUIPS-II). In terms of 
packet delay and packet drop rate, the per-hop behaviors were observed 
under various traffic and network parameters. The results show that the 
per-hop behaviors can efficiently differentiate between service levels on 
the Internet. 



1 Introduction 

Differentiated Services (Diffserv or DS) [214) architecture is proposed to provide 
scalable service discrimination on the Internet. It has recently become a promis- 
ing method to address IP Quality of Service (QoS) issues. Instead of per-ffow 
treatment in RSVP (Resource reSerVation Protocol) |Q, Diffserv provides QoS 
to each packet in the traffic stream. It uses the DS field (which was the TOS 
octet) in the IP header to distinguish QoS requirements of each packet. At the 
network boundary, IP packets are classified to a smaller number of aggregated 
flows, based on setting bits in the DS field. Within the core of the network, each 
aggregated flow is forwarded according to a particular forwarding treatment, 
called per-hop behavior or PHB, which is associated with the DS codepoint. In 
this way, different traffic can obtain different QoS treatments. By pushing com- 
plexity to the network edge, this approach requires neither signaling nor per-ffow 
state maintenance within the core of the network. Thus it greatly reduces the 
network overhead, which in turn increases its potential for scalability. 

Sharing many features of the IETF proposals, we divide the Diffserv PHBs 
into three kinds: the Expedited Forwarding (EE) PHB P], the Assured For- 
warding (AF) PHB p], and the Best-Effort (BE) PHB. The three forwarding 
treatments can build three corresponding Internet services: the Premium service, 
the Assured service, and the default best-effort service. The Premium service is 
a low loss, low latency, low jitter, assured bandwidth service. Such a service 
can be used to create a “virtual leased line”, which greatly reduces the cost of 
building a separate network. The Premium traffic is characterized by a desired 
peak rate for a specific flow. The user contract with the network is not to exceed 
the peak rate. The network contract is that the contracted bandwidth will be 
available when traffic is sent. The Assured service provides a customer with the 
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assurance of a minimum throughput, even during periods of congestion. It allows 
him to consume more bandwidth when the network load is low. The minimum 
throughput equals to the subscribed minimum rate, which is called the target 
rate. This kind of service provides a “better effort” service by controlling the 
drop preference of packets at the time of congestion. As we already know, the 
default best-effort service has no QoS meaning. Its “first come first serve” rule 
provides no guarantee to any of loss, latency, jitter or throughput of a traffic 
flow. 

The objective of this paper is to investigate the three PHBs in a Diffserv 
environment, especially the EF PHB and the AF PHB. In order to carry out 
the investigation, we have developed the Queen’s University IP Simulator v2.0 
(QUIPS-II). By simulation, we can evaluate the performance of the PHBs. The 
rest of the paper is organized as follows. Section El presents the implementation 
of Diffserv in QUIPS-II. Section 0 describes the simulation experiments and 
results of the PHBs in a Diffserv network model. Finally, Section^ provides the 
summary of the paper. 

2 Implementation of Diffserv 

The implementation of Diffserv in QUIPS-II is present in each network node. 
This is because in the Diffserv Internet, it is nodes, or routers, who are re- 
sponsible for handling packets on different traffic flows and applying different 
treatments to them. The nodes can be separated into two categories: the edge 
nodes and the interior nodes. Both types of node are able to forward packets 
based on the DS codepoints which are associated with the PHBs. Moreover, the 
edge nodes are also responsible for traffic conditioning when traffic is entering 
or leaving a DS domain. In QUIPS-II, apart from the default BE PHB, each 
node deploys both the EF PHB and the AF PHB. Here, we do not elaborate the 
AF PHB to multilevel behaviors as defined by IETF, for the simplification and 
approximation of the simulation model. 




Fig. 1. The structure of a Premium marker 



When a packet arrives at the input interface of an ingress edge node, it is first 
classified and then sent into a marker. Each traffic flow has an individual marker 
to treat its packets. Hence there are three kinds of markers in the edge node, i.e. 
Premium marker. Assured marker and best-effort marker. A Premium marker, 
as shown in Fig. [Q is actually a token bucket which is configured with the peak 
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Fig. 2. The structure of an Assured marker 



rate of the Premium flow. An Assured marker, as shown in Fig. 0 is also a token 
bucket which is conflgured with the target rate of the Assured flow. A best-effort 
marker, however, is only a marking mechanism without a bucket regulator. As 
soon as a Premium packet sees a token present, it is forwarded after having its 
DS field marked. If no token is available, the Premium packet will be held until 
a token arrives. Once a Premium flow bursts enough to overflow the holding 
queue, the surplus packets will be dropped. When an Assured packet emerges 
from its Assured bucket, it is marked as IN-proflle. The non-conforming Assured 
packets, however, are not discarded immediately. They will be demoted to best- 
effort packets. In addition, all the best-effort packets are marked as OUT-of- 
proflle and sent into the network. After the traffic conditioning, a marked packet 
is added to a certain behavior aggregate and sent to the output interface of the 
edge node for further forwarding. 




Fig. 3. The output interface of each node 



Each output interface of a node employs a two-level priority queue mecha- 
nism. The Premium packets are assigned to the high-priority queue, while the 
Assured and the best-effort packets are assigned to the low-priority queue. The 
high-priority queue has a simple “non-preemptive” priority over the low-priority 
queue, which ensures that the Premium packets are sent first. While the Pre- 
mium traffic keeps not oversubscribing, it will see no or very small queue. The 
priority queue mechanism can provide the EF PHB to a Premium flow. A RIO 
active queue management mechanism is implemented on the low-priority queue. 
RIO, which will be introduced next, is RED jSj modified to handle a mix of IN 
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and OUT packets. The active queue management mechanism can support the 
requirement of the AF PHB. Fig. Elis a block diagram of the output interface 
for all the nodes in the network. 

The RIO algorithm can be viewed as a combination of two RED algorithms. 
When a packet arrives, RIO determines if the packet is IN or OUT. If it is an 
IN packet, the router only calculates the average queue size for the IN packets 
{avgq-iff). Otherwise, the algorithm calculates the average queue size taking 
into account all packets (avgq-TOTAL), regardless of their markings. The prob- 
ability of discarding an IN packet depends on avgq-iN, whereas the probability 
of discarding OUT packets depends on avgq-TOTAL- Essentially, RIO discards 
OUT packets first whenever it detects an emerging congestion. If the congestion 
persists, RIO will discard all the OUT packets and then begin to discard IN 
packets, in the hope of controlling the congestion. 

3 Simulation Experiments and Results 

We simulated a Diffserv network as shown in Fig. 01 The network consists of 
two edge nodes each with two-level priority queues. One edge node performs 
traffic conditioning to the incoming traffic, while the other does not do so to the 
outgoing traffic. In this study, we ignored the effects of interior nodes on PHBs 
in order to simplify the model. 
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Fig. 4. The network model 



There are three types of traffic coming into the network: the Premium, the 
Assured and the best-effort. They compete for a bottleneck link with bandwidth 
of 45Mbps (5.625MBps). As in the example of [7|, the assignment of the link 
bandwidth to each Diffserv class is 20% for the Premium, 40% for the Assured, 
and the remaining 40% for the best-effort. In the simulation, traffic is generated 
by Constant Bit Rate (CBR) sources, with a variation oi +/- 10% of its source 
speed. For the packet length, it is set to 1500 bytes for all simulation runs. As 
the baseline situation, the size of the Premium holding queue in the edge node 
and that of the high-priority queue are both set to 5 packets, while the size of the 
RIO queue is set to 500 packets. As we already know, there are four parameters 
in RED, namely the minimum threshold mirith, the maximum threshold maxth, 
the maximum drop probability maxp and the queue weight Wq. Hence, RIO 
has two sets of parameters, i.e. <minth-iN, rnaxth-iN, maXp-iN, Wq-iN> 
for the IN packets, and <minth-0UT-, maxth-ouT, rnaXp-ouT, Wq-ouT> for 
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the OUT packets. In the network model, the RIO parameters are chosen as 
< 225,400,0.02,0.004 > for the IN packets and < 200,400,0.05,0.004 > for the 
OUT packets. 

In the experiments, the Diffserv PHBs were observed in terms of packet 
delay and packet drop rate, since the two aspects make the PHBs fundamentally 
different. The parameters to be varied in the simulation model are packet length, 
traffic load, and service allocation. Note that in each run of simulation, only one 
parameter is changed while the others remain the same as defined in the baseline 
case. Some other performance evaluation of Diffserv can be found in m 

3.1 Effect of Varying the Packet Length 

The loads of EF, AF and BE traffic classes in this experiment are all set to 
1. Here, the traffic load is defined by the proportion of the incoming traffic 
rate to the amount of bandwidth assigned to this service. For instance, in the 
experiment, the AF traffic load of 1 represents the ratio of the incoming AF 
traffic rate of 2.25MBps to the assigned AF bandwidth which is 40% of the 
5.625MBps total link bandwidth. 




Fig. 5. Delay behaviors as a function of packet length 



Fig.0 illustrates a comparison of the packet delays experienced with different 
packet lengths. At any packet length, the delays of the three traffic classes are 
almost the same because no packet waits more than a packet-time inside its 
queue. When the packet length increases, the three delay curves will increase 
linearly and synchronously. The reason is quite straightforward. The longer a 
packet, the longer transmission time it takes to traverse the network. 

For the three traffic classes, none of them experience packet loss. This is 
because under this network condition, there are no traffic congestion in the 
network. Moreover, varying the packet length does not give any effect to the 
packet drop rate. 
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3.2 Effect of Varying the AF Traffic Load 

In this experiment, we observe the EF, AF and BE behaviors in response to the 
change in AF traffic load. Here, only the amount of AF traffic oversubscribes its 
reserved bandwidth, the EF traffic load and BE traffic load are kept at 1. 
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Fig. 6. Packet drop behaviors as a function of AF traffic load 
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Fig. 7. Packet drop behaviors as a function of AF traffic load (steady state) 



Fig. El and Fig. Q show packet drop behaviors as a function of AF traffic 
load. As shown, the EF drop rate is 0 because EF traffic has got just enough 
bandwidth. However, the AF and the BE drop rates increase while the AF 
traffic load increases. The BE drop rate is larger than the AF drop rate since 
BE packets are dropped preferentially with respect to AF packets. In Fig. El 
when the AF traffic load is smaller than 1.2, the average drop rates of both AF 
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and BE are very small (negligible). This is because we have chosen a relatively 
long queue and large thresholds to handle AF and BE packets, which prevent 
packet dropping when the queue expands slowly. If observed at the steady state, 
as shown in Fig. Q the two drop rate curves begin to rise when the AF traffic 
load is just over 1.0. 




Fig. 8. Packet drop behaviors of IN and OUT packets 



IN packets 
OUT packets 



2.2 2.4 2.6 2.8 



AF Traffic Load 



Fig. 9. Packet drop behaviors of IN and OUT packets (steady state) 



The different packet drop behaviors of IN and OUT packets are shown in 
Fig. 0 and Fig. El As introduced before, IN packets are the conforming AF 
packets, whereas OUT packets are the non-conforming AF packets plus all the 
BE packets. IN packets should be transmitted through the network with no or 
little dropping, whereas OUT packets have no such assurance. The two figures 
reflect exactly such behaviors. 
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Fig. 10. Delay behaviors as a function of AF traffic load 
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Fig. 11. Delay behaviors as a function of AF traffic load (steady state) 



Fig. Uniand Fig.jTTl compare the different delay behaviors of the three traffic 
classes. In Fig.^1 when the AF traffic load increases from 1.0 to 1.2, the average 
delays of AF and BE do not increase significantly. However, when the AF traffic 
load is higher than 1.2, the two delay curves begin to rise radically. In this 
situation, the low-priority queue becomes full quickly, which brings huge increase 
to both the AF average delay and the BE average delay. At an AF traffic load 
larger than 1.6, the two curves become smooth again with slow increasing trends. 
Drawn at the 90% percentile. Fig. ^Dshows that when the AF traffic load is just 
over 1.0, both the AF delay and the BE delay jump from 0.344 millisecond up 
to 141 millisecond. No matter how we increase the AF traffic load afterwards, 
the two delay results do not change. This is because at the steady state, the 
low-priority queue has been already full. Therefore, all the new incoming AF 
and BE packets experience the same delay. 
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The EF delay curve in Fig. uni is actually an increasing curve, although it 
is not shown very clearly in the graph. As we know, if two packets from two 
different priority queues emerge at the same time, one packet has to wait while 
the other gets the chance to occupy the common link. So, when the density of AF 
traffic increases, the AF packets will have more chance to occupy the common 
link, bringing the EF packets some extra delays. However, the increase is very 
small, with a maximum bound of one packet-time. 

3.3 Effect of Varying the EF Traffic Load 

Here, we observe the performance of EF, AF and BE traffic classes under different 
EF traffic loads. Fig. IT^and Fig. ^ show the performance results. 




Fig. 12. Packet drop behaviors as a function of EF traffic load 



In Fig. H3 neither AF traffic nor BE traffic drops packets. However, the EF 
drop rate increases almost linearly as its load increases. This is fairly straight 
forward. First, both the AF traffic load and the BE traffic load are kept at 1. 
Second, when the EF traffic load is larger than 1, the EF holding queue in the 
edge node will become full and begin to drop packets. The larger the EF traffic 
load, the more EF packets it drops. 

The delay results are shown in Fig. El For the AF and BE delays, they 
remain at about 0.344 millisecond and are not affected by the EF traffic load. 
However, when the EF traffic load is just over 1.0, the EF delay jumps vertically 
from 0.337 millisecond to about 6.40 millisecond then saturates. The reason for 
the jump is that each EF packet has to wait a fixed amount of time inside its 
holding queue at the network edge. 

3.4 Effect of Varying the EF Traffic Allocation 

In this experiment, the effect of varying the EF traffic allocation on the delay 
behavior is investigated. We vary the EF reserved bandwidth from 10% of the 
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Fig. 13. Delay behaviors as a function of EF traffic load 



link bandwidth to 50%, while keeping the AF reserved bandwidth at 40%. The 
remaining bandwidth is left for BE traffic. The loads of AF, EF and BE traffic 
flows are all set to 1. Fig.lTHshows the delay behaviors of the three traffic classes. 
While the EF traffic allocation increases, the EF delay decreases until it reaches 
a saturation of about 0.278 millisecond. However, the AF delay and the BE delay 
do not change significantly. The reason is that, when the EF traffic allocation 
increases, the EF packets will have more chances to be transferred by competing 
with other packets. 




Fig. 14. Delay behaviors as a function of EF traffic allocation 
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4 Summary 

In this study, we used QUIPS-II simulator to investigate the Diffserv PHBs in 
an IP network environment. By varying parameters in the simulation model, we 
observed and compared the performance of three different kinds of PHBs. The 
results show that the Diffserv PHBs can efficiently differentiate between service 
levels on the Internet. 
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Abstract. In the article, we proposed a new scheduling architecture 
based on rate-proportional servers, which is able to support different 
services in the DiffServ model by using a single service discipline. The 
scheduling algorithms allow to reduce the implementation complexity at 
the network node in the sense that there is no need to deploy a multi-level 
scheduling architecture for multiple types of services and for link sharing. 
The network operator is able to provide not only delay differentiation but 
also other services, such as bandwidth reservation, link-sharing or traffic 
engineering, etc. 



1 Introduction 



A few years ago the essential Internet applications were mainly such elementary 
services as e-mail. Web-surfing or file transfer. In contrast, users today expect 
that Internet service providers (ISP) offer different services as well as price pat- 
terns so that they can choose the one appropriate for them. Consequently, ser- 
vice providers have to not only provision higher capacity links, but also need 
to introduce more sophisticated network architectures, which can satisfy varied 
requirements of different customers. 

An evolutionary approach to provide service differentiation in the Internet is 
the DiffServ model |2|. The main goal of DiffServ is to overcome scalability prob- 
lems in the IntServ model fp. Instead of providing service to individual flows, 
DiffServ supports only a limited number of classes of service. Flows belonging to 
the same class receive the same service from the network. A router in the DiffServ 
model has only to focus on traffic aggregates of classes of service, thus reducing 
complexity. In this work, we focus on a research direction of the DiffServ model 
called Relative Service Differentiation (RSD). In the Relative Service Differen- 
tiation approach, traffic from a higher priority class receives better (or at least 
not worse) services than lower classes in terms of queueing delays and packet 
losses f3iII5ltij . Services offered in RSD networks are per-hop rather than on an 
end-to-end basis. A well-known model for the Relative Service Differentiation is 
the Proportional Differentiation Model. Quantitatively, service offered by class i 
in the Proportional Differentiation model is defined as follows Pj. Let qi be the 
performance measure for class i, e.g., delay or loss, then the following equation 
is applied for all classes of service: 
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^ = ^ Vz,j e iV} . (1) 

<l3 Cj 

where Ci < C 2 < . . . < c^v are the generic quality differentiation parameters and 
are selected by the network operator. Early work by Dovrolis, et al. |2| outlined 
definition and main issues of relative service differentiation, it also introduced 
two scheduling algorithms for delay differentiation, namely the Backlog Propor- 
tional Rate scheduler (BPR) and the Waiting Time Priority scheduler (WTP). 
In our paper, we argue that DiffServ networks must facilitate not only delay 
differentiation but also other service performance at the same time. Besides de- 
lay, customers, who use the DiffServ network for data transmission for example, 
would require such another service as throughput differentiation. Furthermore, 
users from different organizations or groups of service, that is, groups demand- 
ing delay performance vs. groups demanding throughput performance, require 
that the DiffServ network provide a link sharing mechanism to control the load 
distribution between them. In the article we present another WFQ-like service 
scheduling that can maintain service differentiation as accurate as that under 
WTP and provides a mechanism for throughput differentiation and link sharing 
at the same time. The rest of the article is organized as follows. Section 2 dis- 
cusses some remaining issues on the existing schedulers and the main reasons 
that motivate us to develop a delay and throughput differentiation model, which 
is integrated in a single scheduler. Section 3 describes our scheduling algorithms. 
Performance study is shown in Section 4. The last Section concludes the work 
and outlines further possible research on the direction. 



2 Previous Work 

on Relative Differentiation Service Schedulers 

Until now, there are some approaches for the Proportional Differentiation Model 
mm- In the article, we focus on BPR and WTP schedulers proposed in P|. In 
our views, besides delay that is important for real-time applications, throughput 
is another fundamental performance metric for best-effort services. Moreover, 
link sharing is an essential mechanism that allows the network operator to effec- 
tively control traffic load between service types, protocol families, multiple agen- 
cies or carry out traffic engineering. Thus, a DiffServ network should support 
different services simultaneously, such as delay and throughput differentiation 
as well as perform link sharing. Starting from this point, we are now going to 
examine the advantages and disadvantages of WTP and BPR schedulers and 
how they can support various service performance. 

WTP is a priority scheduler in which the priority of a packet increases pro- 
portionally with its waiting time. The priority of a packet in queue i at time t 
is defined as follow: 



Pi{t) = Wi{t)si . 



( 2 ) 
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where Wi{t) is the waiting time of packet i at time t, {s^} is the set of Scheduler 
Differentiation Parameters. The WTP scheduler, on the one hand, is found to be 
consistent in approximating the proportional delay differentiation model defined 
in Equation Q under different load condition and traffic patterns • On the 
other hand, by decoupling delay from service rate, it is difficult to maintain 
other types of service performance, e.g., throughput differentiation etc., at the 
same time without using a multi-level scheduling architecture. For example, in 
WTP a user cannot guarantee that he gains more bandwidth by using higher 
class of service. Thus, WTP scheduler is consistent in delay differentiation but 
not in bandwidth differentiation. 

In contrast with the Waiting Time Priority scheduler, BPR relies on a prop- 
erty of Generalized Processor Sharing (GPS) systems |Z| that delay of a packet 
depends on the rate allotted to the packet’s session and the queue backlog of 
that session at the time the packet arrives. BPR reallocates rates to its classes 
of service proportionally to their backlog. Let ri{t) be the service rate assigned 
to queue i at time t, qi{t) be the queue i backlog at time t. For two back-logged 
queues i and j, the service rate allocation in BPR satisfies the proportionality 
constraint: 



rj{t) _ g»(t) 

gW Sjqfft) 

N 

= R . 

i=l 

where {s^} are the Scheduler Differentiation Parameters and R is the link capac- 
ity. Since BPR is based on rate allocation, one can easily integrate link sharing 
policies and throughput differentiation into BPR. Unfortunately, BPR’s perfor- 
mance in terms of proportional delay differentiation is remarkably worse than 
WTP when load distribution between classes of service is asymmetric Pj. 

In the next sections we will describe another scheduling mechanism, which 
we call Differentiated Delay and Throughput Scheduler (DDTS). Our goals are 
to develop a scheduling architecture that is able to perform delay and through- 
put differentiation and link sharing simultaneously. The scheduling policies are 
integrated in a single service discipline. Under different load patterns, the new 
scheduling architecture should keep its delay differentiation performance pre- 
dictable and as close to the performance of WTP scheduler as possible, thus 
eliminate the shortcoming of the BPR scheduler. 

3 Scheduling Algorithms 

In this section, we firstly define a service architecture for the Differentiated 
Delay and Throughput Scheduler. The following part presents algorithms for 
two service differentiation models under DDTS, that is, delay and throughput. 

The definition of DDTS is based on the concept of Packetized Generalized 
Processor Sharing [Z1 and the link-sharing model presented in |8IDj . A DDTS 



( 3 ) 

( 4 ) 



An Adaptive Bandwidth Scheduling for Throughput and Delay Differentiation 



259 




Fig. 1. Service architecture of DDTS 



server can be logically presented by a three-level tree with a positive weight 
< 1) associated with each node n in the tree. The root node corresponds 
to the physical link with capacity R and each leaf node corresponds to a class 
of service with a queue (Fig.P). 

In the architecture, two groups of service are defined. The first group of 
service (Gl) is for throughput differentiation and the second one (G2) is for 
delay differentiation. For the sake of simplicity, in DDTS the weight <l>i assigned 
to a node i coincides with the percentage of total link capacity that node i will 
take up in case all sessions are back-logged, that is: 



1 + Pg2 = 1 • 


(5) 


'^Pi = Pgi ■ 


(6) 


ieGl 




= ^G2 ■ 


(7) 



t e Bg2 



where 'Pci and Pq 2 are the link-shares of group 1 and group 2, respectively. 
Pi is the weight assigned to leaf node i and Ba{f) is the set of all back-logged 
sessions of group i at time t. It is worth noting that other types of service, 
such as some classes of assured services, can easily be added in DDTS by us- 
ing the link-sharing model in Figure [D In that case, several level-2 nodes are 
added to present the link shares of the new classes. From the implementation 
point of view, the service architecture of DDTS is realized by a single scheduler, 
which is able to carry on different policies for different classes of service. DDTS 
comprises two components: the first one is a work-conserving scheduler, which 
can be implemented with any of the rate-proportional servers, e.g., WFQ jam, 
WF2Q or SGFQ We assume that readers are already familiar to these 
rate-proportional algorithms. The second component is the measurement and 
rate adaptation module. Its functions are to measure the average delays of delay 
differentiation classes and to adjust their service rates so that delays between 
two adjacent classes of service are consistently spaced according to the Scheduler 
Differentiation Parameters (Eq. 0, independent of the current load pattern. 
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3.1 Scheduling Policy for Throughput Differentiation 



It is not difficult to perform throughput differentiation in DDTS due to its rate- 
proportional nature. The throughput differentiation model is defined as below: 



W,{t) 

W,(t) 






V*,jeGi 



( 8 ) 



where Wi{t) denotes service offered to class i {i G Gl) during time interval 
{t,t -I- At]. <l>i is the weight pre-assigned to class i, such that “ ^Gi- 

Here, plays the roll of the quality differentiation parameter. Under such a 
rate-proportional scheduler as WFQ, WF2Q or SCFQ, Equation 0holds for any 
back- logged session at time t. 



3.2 Scheduling Policy for Delay Differentiation 

Since packet delays are not directly related to rate-allocation in rate-proportional 
servers, it is more difficult to meet the requirements on delay differentiation. 
Delay spacing between classes can easily be deviated from the desired values as 
in case of BPR 0. Let Si, i G G2 be the set of delay differentiation parameters. 
In the delay differentiation model, the average delays perceived by packets in any 
two delay classes are the inverse ratio of the corresponding delay differentiation 
parameters: 



dj{t) 



^ Vi,j e G2 . 



(9) 



The key concepts in DDTS are the introductions of the equivalent queue 
length qi{t) and the adaptive differentiation parameter Gi{t) for delay class i at 
time t. We rely on the idea of the BPR scheduler that the server allocates rates 
to the classes of service according to the current queue length of each session 
(see Equations El and 0), but instead of using real queue length, we calculate the 
equivalent queue length as follow: 



qi{t) = ri{t)wi{t) = R= ^ • (10) 

where Wi{t) denotes the waiting time of the head-of-line packet of class i at 
time t, ri{t) denotes the rate allocated to class i at time t. The idea behind the 
equivalent queue length is that rate allocation in DDTS is directly associated 
with delay experienced by packets in the queue. Furthermore in some cases of 
non work conserving systems, queue length is not directly related to delay of 
packets. Allocating rate according to queue length in these cases might not be 
accurate. 

The measurement and rate adaptation module measures average queueing 
delays in the node and uses these information as feedback signals to control the 
rates of delay classes so that delay spacing between classes of service is kept in a 
more precise manner, independent of different load patterns. Assume that packet 
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k from queue i leaves the system at time t, the average normalized delay over 
all delay classes and the average delay of class i are calculated as follows. We 
normalize delay di(t) of packet k of class i with parameter Sj, then we calculate 
the average value of normalized delays over all delay classes. Let the average 
delay of class i be di{t), A{x{t)) be the average function of variable x at time t, 
we have: 



djv(t) = A{sidi{t)) . (11) 

d,{t) = A{d,{t)) . (12) 



In DDTS, the server attempts to keep the normalized average delay of a class 
as close to d]\[{t) as possible, that is, Sidift) — >■ dAr(t). The adaptive differentiation 
parameter Ui{t) is defined as below: 

_ Sjdi{t) _ SjA{di(t)) _ A{dj{ty) 2 /, o', 

’ ~ Mt) ' " A{s,dft)) ' " A{s,d,{t)) * ■ ^ ^ 

DDTS makes use of <Ti(t) instead of s^. Thus, rate allocation for delay differ- 
entiation classes in Equations 01 and E] becomes: 



rj{t) 



(T^{t) qi{t) 

aj{t) qj{t) 



\/i,j € G2 . 



(14) 



and '^rft) = R= — (15) 

^ke{BGinBc2} j 

From 03 and 03 it is remarked that, if Sidift) = dj^ft) then ai(t) = Si. 
Also, if Sidiff) > d^it) then aft) > Si. Consequently, the average delay of class i 
tends to reduce in the next period. The deployment of the adaptive differentiation 
parameter ai(t) in DDTS is a mechanism to provide consistent differentiation 
between classes, independent of varied load pattern. We will verify this in the 
next Section. 

It is noted that different average functions A(.) can be used. In our approach, 
we use the exponential window moving average algorithm: 



Xk = uiXk + (I - iv)xk-i ■ (16) 

where a: is a variable in time domain t, Xk and Xk are a sample of x and the 
exponential moving average at step fc, respectively, w is a weight with constraint 
0 < w < 1. 



3.3 Complexity 

The measurement and rate adaptation module in DDTS is responsible for (1) 
delay measurement, (2) weight calculation and (3) update of Finish Tags for 
head-of-line packets. These operations have to be done each time a packet leaves 
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the system. While DDTS measures delay only for the queue that has the de- 
parting packet, operations (2) and (3) need to be invoked for every active queue 
in the system. However, these operations can be efficiently implemented today, 
since each queue in DDTS is for traffic aggregate rather than for a single flow 
and the number of classes of service in DiffServ are normally limited to some 
tens. DDTS has the complexity of 0{n), where n is the number of classes of 
service. 

4 Evaluation of DDTS Algorithms 

In the last Section, algorithms for throughput and delay differentiation are 
shown. We will verify the algorithms through simulation experiments in this Sec- 
tion. Our objectives are to compare performance of DDTS and WTP in the con- 
text of delay differentiation under varied load patterns and link utilization, and 
to evaluate link sharing, throughput and delay performance in case of heteroge- 
neous traffic. All simulations are implemented under ns-2.1h5 network simulator 

ra- 

The first and second experiments compared the delay performance between 
DDTS and WTP. 40 Pareto sources are used to generate packets into a node 
with DDTS or WTP server. The server serves TV = 4 queues, one for each delay 
differentiation class. The Pareto sources have the average burst time 0.35s, the 
average idle time 0.65s, rate (in “ON” state) 32 kbps and the shape parameter 
1.9. We ran the simulations in 600 seconds and collected data from the last 400- 
second period, the first 200 seconds of the simulation are to bring up the system 
into steady state. The exponential moving average weight is set to w = 0.05 in 
all simulations. 

The first experiment intended to test the performance of DDTS in terms of 
delay spacing between classes under different link utilization. The load distribu- 
tion between delay differentiation classes is set to: Class-1: 40%, Class-2: 30%, 
Class-3: 20%, Class-4: 10% of the total load. There are 2 scenarios having been 
simulated: in the first scenario, the ratio Si/si-i is equal to 2 and in the second 
one, it is equal to 4. In each scenario, we examined the performance of DDTS 
and WTP schedulers when the total traffic varies from moderate (75%) to heavy 
load (99%). From Figured it is noteworthy that the average delay ratios of the 
both schedulers deviated remarkably from the desired values in moderate loads, 
while the proportional delay differentiation can be maintained more accurately 
in heavy- load situations. 

The second experiment aimed to investigate the delay ratios between two suc- 
cessive classes under different load distributions. Similar to the first experiment, 
two scenarios with the ratio Si/si-i equal to 2 and 4, respectively, have been 
simulated. Link utilization in the second experiment is fixed to 95% (Figure Ej). 
There are 7 simulations in each scenario, in which the load pattern between 
four delay classes varied from symmetric to asymmetric distributions. The four 
numbers in each bar from Figure El denote the load distributions of the classes 
in percentage (legend of these bar graphs is similar to that in Figure I3)- 
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(a) Scenario 1 (b) Scenario 2 

Fig. 2. The average delay ratios between successive classes under DDTS and WTP 



Results derived from the both experiments showed that in most cases, the 
performance of DDTS is nearly the same the performance of WTP. This general 
trend is quite satisfactory. 




rightmost - WTP leftmost - DDTS rightmost -WTP leftmost - DDTS 



(a) Scenario 1 (b) Scenario 2 

Fig. 3. The average delay ratios under different load patterns 



In the last experiment , we investigated throughput and delay performance of 
DDTS in case multiple types of traffic present in the network. There are 8 classes 
of service in the simulated model: the first four classes are categorized into delay 
classes while the rest four classes belong to throughput differentiation classes. 
The link shares of delay and throughput differentiation classes are <Pci — O-fj 
^G 2 = 0.3, respectively. The output link capacity is set to i? = AMbps, equivalent 
to 2.8Mbps for delay classes and 1.2Mbps for throughput classes. Real MPEG 
video sources and TCP sources are used in the simulation. Each video source 
consists of 40,000 MPEG frames (about half an hoiirlfT^. Class 1 consists of 4 
video traces: 1 Jurassic Park, 1 Starwars and 2 video conferencing sessions; Class 
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2: one trace of German news; Class 3: 1 Asterix, 1 German news; Class 4: 3 video 
conferencing sessions. Each class from 5 to 8 has one FTP/TCP Tahoe source. 
The link utilization is about 98%. The sets of delay differentiation parameters 
{si} and weights {^i} are displayed in Figure 0 
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Fig. 4. Delay and throughput performance for different services 



The start times at which the sources begin sending data into the network are 
randomly distributed. These random numbers are between the 0 and 10th second 
for FTP sources and between the 10th and 30th second for video connections. 
Delay differentiation in short time scale is maintained for real-time services as 
illustrated in Figure Ela. For throughput differentiation sessions, as depicted 
in Figure 0b, the FTP connections take advantage of free bandwidth to send 
more data into the network, when the video sources are idle. As other sources 
begin transmitting packets, their allocated bandwidths reduce gradually and 
converge into the nominated rates, independent of the window sizes set for each 
TCP source. The ratios between bandwidths allocated to successive classes are 
maintained independently of the load situation. 



5 Conclusion 

In the paper we developed a new scheduling mechanism called Differentiated De- 
lay and Throughput Scheduler. A combined service model is proposed, in which 
delay differentiation services and link sharing mechanisms coexist. Delay differ- 
entiation in DDTS is as accurate as in WTP. Our measurement-based approach 
used in DDTS enables the network node to maintain service differentiation be- 
tween classes of service, independent of load distribution between the classes. 
Furthermore, our approach supports an integration of delay and throughput dif- 
ferentiation classes and different link sharing strategies into a single scheduler. 
The advantages of the new scheduling mechanism are: 
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— Simplicity, in the sense that there is no need to deploy a multi-level schedul- 
ing architecture for multiple types of services and link sharing, thus it allows 
to reduce the implementation complexity at the network node. 

— Flexibility, meaning that by use of DDTS, the network operator should be 
able to provide not only delay differentiation but also other services, such 
as bandwidth reservation for assured services, load control for best-effort 
services, link sharing or traffic engineering. 

We like to emphasize that our work is still far from completion. The in- 
troduction of throughput differentiation classes in the article is an example for 
rate-proportional allocation and link sharing rather than a new class of service, 
since higher throughput allocated to a class does not mean that a single flow 
belonging to that class will receive more bandwidth than flows belonging to 
“worse” classes. Here, it might require that users deploy an adaptation mecha- 
nism to dynamically adjust the flow’s class in order to achieve an appropriate 
quality of service, which is subjected to future research. 
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Abstract. New applications have been introduced to the today’s “best-effort” IP 
networks having different bandwidth and delay guarantee requirements. The 
IETF is currently focused on Differentiated Services as the architecture to 
provide Quality of Service to IP networks. Towards this effort, an overlay 
Resource Control Layer on top of a Differentiated Services core network is 
introduced in this paper, in order to provide a simple control plane architecture 
that enables the overall handling of network resources and the configuration of 
network elements in a domain. Therefore, a dynamic algorithm is proposed for 
that layer to manage, adjust and distribute resources in an efficient and 
dynamical way. The simulation results show that this algorithm provides a 
significant improvement in bandwidth assurance and utilization of network 
resources compared with a static resource assignment approach, keeping at the 
same time complexity at a low level. 



Introduction 

The Internet today provides a best-effort architecture, which is basically ideal for 
elastic applications, such as e-mail and file transfer. The network traffic though has 
increased as the number of users and applications has also increased. Moreover, the 
Internet traffic has also changed in character; new bandwidth-demanding and delay- 
sensitive applications (voice-over-IP, IP-telephony, video-conferencing) require or at 
least benefit from Quality of Service (QoS) [1,2] or other form of prioritisation that 
guarantees an Internet connection. Increasing bandwidth is not always sufficient to 
accommodate these increased demands. QoS mechanisms provide expected and 
predefined service guarantees by better managing the available bandwidth. 

The Differentiated Services (DiffServ) architecture [3,4, 5, 6] is nowadays the 
preferred architecture, which can address quality of service issues in IP networks. It 
provides a coarse and simple way to categorize and prioritize network traffic (flow) 
aggregates, leaving complexity at the “edges” and keeping the “core” network simple 
enabling its scalability. Edge devices (ED) in this architecture perform packet 
classification, policing, shaping and marking in order to ensure that individual user’s 
traffic conforms to the specified traffic profiles and aggregate traffic into a small 
number of prioritized classes. Core routers treat packet aggregates with Per-Hop- 
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Behavior (PHB) [7,8] according to their markings. PHB is the forwarding treatment 
that a packet receives at a network node. The concept of the Bandwidth Broker (BB) 
Architecture [9,10] was proposed by Internet2 in order to provide an overall resource 
management, policy-based admission control and configuration of specific network 
elements (leaf, core and border routers). 

Our proposed architecture is based on the DiffServ and BB [11,12] concept. It is 
basically a realization of a distributed BB architecture, promising scalability and 
efficiency. Consequently, an additional layer on “top” of the DiffServ architecture is 
realized, the Resource Control Layer (RCL) as described in the [13]. The RCL is 
composed of different distributed entities each one assigned a specific task. The 
algorithm realized and evaluated in this paper is responsible for the resource 
management performed by the RCL. The rest of the paper is structured as follows: in 
the following part an outline of the proposed architecture is presented. In the last 
section, the implemented algorithm is shortly described and evaluated. 



Motivation & Proposed Architecture 

The architecture proposed aims at an efficient management and distribution of 
resources between the different nodes of a DiffServ architecture. This is basically 
realized by the proposed algorithm implemented in this layer, which achieves a good 
utilization of network resources. The architecture is fully analyzed in [13], and here 
its main functionality is described. It is composed of three logical entities. To start 
with, the Resource Control Agent (RCA) is the highest control entity in an 
administrative domain and is responsible for configuring the appropriate network 
entities and managing the network resources. Moreover, it has the overall view of the 
policies enforced in a domain and decides for the management of bilateral Service 
Level Agreements between adjacent administrative domains. Second, the Admission 
Control Agent (ACA) performs admission control based on the traffic profile between 
the user and the network. In this way, it controls the access of the user to the network 
and performs authorization and usage metering (accounting) functions. Last, the End- 
User Application Toolkit (EAT) provides a graphical interface to end-user 
applications and enables them to signal their requirements to the QoS infrastructure. 
The above logical entities can be distinguished in Eig. 1 . 

In order the RCA entity to manage more efficiently the resources distributed 
among the networks elements, a hierarchical architecture inside the RCA is proposed. 
Therefore, instead of having a centralized resource management entity, a distributed 
one is proposed, separating the network to sub-networks. Each sub-area has its own 
initial resources, which are assigned according to traffic loads forecasts and/or results 
retrieved by a measurement-based platform. The structure of the RCA is depicted in 
Fig. 2. 

The resources assigned to the administrative domain (root) are distributed among 
the sub-areas, each one represented by a Resource Pool (RP). Moreover, each sub- 
area can also be further divided into sub-areas, forming the above hierarchical 
structure. Another reason for the creation of RPs is the correct management of 
bottleneck links and the efficient sharing of its bandwidth between the RPs of the 
lower level. The Resource Pool Leafs (RPLs) correspond to the resources assigned to 
each ACA. Each ACA is based on those resources to perform admission control. The 
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assignment of the resources is a top-down procedure, from the root of the tree down 
to the RPLs. On the left hand of the Fig. 2. is given an example of RPs creation based 
on the network of Fig. 1. and on the right hand a more complicated hierarchical 
structure. 




Fig. 1.: RCL infrastructure 




Fig. 2 .: Flierarchical structure of RCA 

The initial assigned resources may not correspond to actual traffic load, therefore, 
the RPLs/RPs are capable of adjusting and adapting those initial resource assignments 
to real traffic conditions, which are difficult to be forecasted and may change during 
time. 
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The Algorithm 



The main target of the algorithm is to efficiently handle the re-distribution of 
resources. This is invoked when an RPL does not have enough resources to 
accommodate a new user request. Each RP and RPL is basically described by the 
following set of parameters: 

: upper limit of resources that can be assigned to an RP/RPL. 

.• current resource assignment to an RP/RPL 
R^^^ : current reserved resources of an RP/RPL 

, .• currently free resources of an RP/RPL 

R^jj : maximum resources that can be additionally assigned to an RP/RPL 

The equations (l)-(6) describe the initial resource status of an RP/RPL as well as 
the relation of the resources of a father RP and its children (f: father, c: children): 



R >R,,>0 

max lot 


(1) 


R, =R, -R 

free tot res 


(2) 


R „ = R -R,, 

add max lot 


(3) 


R' ^IR,, 


(4) 


R'_>R',^ 


(5) 


ir^>R ^ 


(6) 



The network administrator is responsible for defining the initial resources to be 
distributed to the nodes of the tree. After this top-down start-up procedure, initial 
resources are assigned to all nodes of the tree. Sequentially a user can make its 
resource reservation requests to the EAT, which forwards these requests to the ACA. 
Under the condition that the user access to the network is verified, ACA hands over 
this request to the corresponding RPL for admission control. 

According to the algorithm realized, an RPL will make a request for additional 
resources to its father when its current free resources are not adequate to serve a new 
request. The child makes a request and the father is responsible for deciding how 
many resources to give to its child, depending on the amount of resources requested, 
the upper limit defined by the child (R„^j) and the amount of its free resources. In case 
the father does not have enough resources will also make a resource request to its 
father RP (of the above level). This procedure can continue up to the root of the tree. 
The procedure of finding additional resources is bottom-up, i.e. from the leaves of the 
tree up to the root. 

A number of additional parameters must be defined for the realization of the 
algorithm: 

R^^^ .■ minimum resources requested from an RP/RPL 

R^^^^ : resources actually received from a child after a request for more resources 

to its father 

: number of max resource shifts; father RP increases the resources of its 
child by x R^^^ 

A , : number of med resources shifts; father RP increases the resources of its 

child by A^,xR^^^ 

A^.^ : number of min resource shifts; father RP increases the resources of its 

child by A^-^ x R^^^ > A^^j ^ 1) 

: a low limit for the free resources of the RP, Pi^< 1 
p„ : a high limit for the free resources of the RP, p„ <1 (p„> p^) 
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The and the p^ determine two limits for the free resources of an RP. Actually a 
low and a high watermark are defined corresponding to p^ x and p^ x 

As long as the RPL has enough resources to accept a reservation request, there is 
no need of redistribution of the resources. In case an RPL does not have efficient 
resources to accommodate an it asks more resources from its father RP, and the 
latter decides how much to give back to it, The same procedure can be repeated 
many times, up to the root of the tree. The steps of the proposed algorithm executed 
by the RPL after a resource reservation request are: 

1. if + R^^^ > then reject the request; 

2. if +R then admit request +R 

'T res req tot J- res res req 

end then (2) 

3. else if +R > R'"’" 

then calculate resources to ask from father R^^!.=(R ^^^ + R^^J-R make 
a request to father R^^^^ = request(R^^^ , 

if request accepted by father RP then admit the request change total and 
reserved resources: 

J^RPL ^J^RPL J^RPL ^J^RPL end thcn 

tot tot recv ’ res res req 

else reject the request; end then(3) 

In case a father RP can not assign to its child not even the minimum amount of 
resources requested, it requests in the same way resources from its corresponding 
father. The father RP uses the following algorithm in order to calculate the resources 
to give back to its child. The father RP basically compares its low and high watermark 
of free resources with a multiple of the resources requested. Depending on the result 
of the comparison, it gives back an appropriate multiple of the 

resources requested. 

7 ^ max ^ ^a,t ^ Pl^ ^free thcn R^^^^ = mm(A^ X R^ R^^), R,.,, = R,.,, + R,.„.„ 

return R^^^^ end then (I) 

, else if A ,xR ,<p^xR, then R = mini A ,xR „ R‘ R =R +R 
return R^^^^ end then (2) 

3 else if A x R^ ^ then R^^^^ = min(A^^^ x R^ R\jJ, R,.,, = R,.,, + R,.„.„ 

return R^^^^ end then (3) 

4. else ask resources from its father 

R:a. = (K+RJ-K, 

R rx„ = request R\^^,R^J 

if request accepted by father then R,^ ^ = R,^ ^ + R , goto step( 1 ) end then 
else reject the request; end then (4); 

When the ACA makes a release request to the RPL, the latter de-allocates the 
corresponding resources and checks whether or not it can give back any free resources 
to its father. In order to take such decision an additional set of variables are defined: 

I : a low limit of the R ,^^, , 1<1 
R^^i ; requested resources to be released 
R\^, : resources to be given back to the upper level 

a : it determines indirectly the actual amount of resources to be returned, a<l 
The low watermark, I x Rtot, is used to check the current status of reserved 
resources of an RP/RPL. In case the reserved resources before the release are above 
the low watermark and the resources after the release are below this watermark, then 
an amount of free resources should be returned to the upper level. The purpose of this 
double check of resources is to control that an RP/RPL is not actually in an initial 
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State, where resource reservations have just began. In that case its reserved resources 
may not have yet exceeded the low watermark so that resources should not be 
returned to the upper level. The amount of resources to be given back should be 
calculated considering the trade-off between giving as much as possible and keeping 
resources for future use. This calculation is actually based on the desired level of 
reserved resources between the total resources and the low watermark. The value of a 
determines this level. 

The algorithm for deciding and calculating the resources to be returned is: 

1. After the release: = R - R^^, 

2. if (R\^^< lx R^J and ( R^^^ > ) 

then have to give back resources to the upper level so that reserved resources to 
be between the R ’ and I x R’ : 

tot tot 

= a (R’„„ + I xR’,J, where R’„„ = R„„ -R’„, 

From above: R\^, = / (a (1 + 1) else do not give back resources 

end then (2) 



Simulation 

Simulations were carried out in a Pentium III PC with the help of a special tool that 
has been developed in JAVA programming language. In order to understand fully the 
behavior of the algorithm, a tree structure has been defined and implemented, as 
depicted in 

. The actual tree structure does not play a crucial role for the study of the proposed 
algorithm. 

A simulation experiment consists of a random process of reservation request 
arrivals. Each request arriving to an RPL may be admitted or rejected according to the 
specifics of the algorithm in question. The inter-arrival time of reservation requests 
follows an exponential model, while the size of the resources requested have a 
standard capacity of 128kbps. Each leaf node has a weight, which determines the 
amount of initial resources assigned to it. Those initial resources in a real network 
could have been based on some load forecasts. The offered load to the leaf nodes 
differs from the one forecasted in order to prove the adaptability of the algorithm. 
While the resources are distributed to nodes 1,2,3 with weights 0.5, 0.3, 0.2, the actual 
offered load is correspondingly 0.5, 0.4, 0.1 for the half time of simulation time and 
0.5, 0.2, 0.3 for the rest time. 




C '' ; ( 2 ® 



Fig. 3: Simulation topology 

In order to verify the performance achieved by the proposed algorithm, it is 
actually compared to a static configuration, where the concept of resource pools is not 
used. An amount of resources is assigned to each ACA, which do not change during 
simulation. Moreover the behavior of the proposed algorithm has been examined 
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under different set of values of parameters. Table 1. summarizes those parameters 
and assigns to them a possible value. 



Table 1. Main variables of the algorithm 



Variable 


Value 




5(3-8) 




3(2-4) 


A^ 


1(1-3) 




0.2(0.2-0.5) 


w„ 


0.6(0.5-0.8) 


L 


0.5(0.5-0.7) 


A 


0.5 



Primarily we have examined the variation of Rtot and Rres in time for all the 
RPs/RPLs of the tree, changing the values of the parameters in Table 1. . In general 
the algorithm offers an exceptional adaptability as indicated in Fig. 4 for an RPL and 
Amax is set to the value of 7. The adaptability of Rtot to the reserved resources, Rres, 
depends mainly on the values of Amax and 1. The greater the value of Amax the less 
adaptive the algorithm becomes, since a greater amount of resources will be re- 
assigned to a child after a request() call. The value of / determines the level that 
resource release must be, meaning that the greater its value is, the sooner unused 
resources will be released to the upper level. 

In sequence the number of interactions among all nodes of the tree was examined 
for different values of Amax. As a result of the simulations the greater the value of 
Amax the smaller the number of interactions. 




Fig. 4: Status of Resources of a RPL 



Algorithm for Dynamic Resource Distribution in a Differentiated Services Network 273 




Fig. 5 : Utilization in relation to Amax 

Another crucial characteristic for the performance of the proposed algorithm is the 
utilization of the network resources. The average utilization has been measured for 
each leaf varying the value of Amax from 3 to 8, as illustrated in Fig. 5. The algorithm 
really provides a high utilization, which is inversely proportional to the value of 
Amax. The current utilization of resources of each node depends also directly on the 
value of /, since / composes an under bound for the utilization. 

It has been also examined the response of the algorithm to the modification of 
values of the other parameters. Amin, Amed, and also influence the utilization 
and the number of interactions in the same way as Amax, but they have a smaller 
impact than Amax. In addition the behavior of the parameter a is identical to that of I, 
since they both determine the state that release of resources should take place. 

Finally the number of rejected resource requests has been measured for the 
proposed as well as the static algorithm, as depicted in Fig. 6. The nodes 1 and 3 
under the proposed algorithm invoke no rejections while node 2 (RPL2) generates a 
small number of rejections. The nodes under the static algorithm generate a number of 
rejections, which are proportional to the offered load. It is really obvious how the 
proposed algorithm outperforms the static version, offering a really smaller number of 
rejections, since it achieves a dynamic resource distribution between the leafs of the 
tree. 

Summarizing, there is trade-off between the utilization of network resources and 
the interactions between the nodes of the tree. When the main goal of the 
implementation is a small number of interactions among the remote nodes for 
improving the performance, then a relatively large value of Amax is required. 
Consequently, a smaller utilization of network resources is achieved. It depends also 
on the network administrator to tune appropriately the value of Amax and the other 
parameters in order to achieve the desired performance. In addition it has been 
verified the significant improvement in bandwidth assurance and resources utilization 
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of the proposed algorithm compared to a static version, which keep though the 
complexity at a really low level. 




Fig. 6: Number of rejected resource requests 



Conclusions & Future Work 

The proposed realized algorithm uses some techniques in order to adapt efficiently 
and dynamically the resources of an RP/RPL to real traffic loads. The simulation 
results prove how this algorithm outperforms a static configuration, without a 
significant complexity burden. 

A management platform is under study in order to provide a graphical interface for 
the monitoring and configuration of the RPs. In addition new versions of the proposed 
algorithm are planned for the future in order to further examine the role of the 
different parameters as well as to tune their value more properly. 
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Abstract. In this paper the performance and capacity gain achievable 
with quality of service (QoS) management in packet switched radio net- 
works based on the General Packet Radio Service (GPRS) are examined. 
Both the functions defined in the GPRS specification for QoS support 
and implementation-specific strategies for subscriber- and application- 
based Connection Admission Control (CAC) and scheduling are intro- 
duced. Taking characteristic measures for a pure best-effort service as 
the basis the effect of these QoS functions on the throughput and delay 
is analyzed. To achieve this, simulation results of GPRS performance 
and system measures for different load situations are produced with the 
simulation tool GPRSim that models the realistic traffic behaviour of a 
GPRS network. 



1 Introduction 

In the context of the evolution towards 3rd Generation (3G) mobile radio net- 
works, packet switched data services like the General Packet Radio Service 
(GPRS) and the Enhanced GPRS (EGPRS) are presently introduced into GSM 
and TDMA/136 systems worldwide m- While in the first phase only best- 
effort data services without differentiating subscribers and applications will be 
supported, in the second phase quality of service (QoS) management functions 
will be integrated to be able to guarantee subscriber- and application-specific 
QoS requirements. For radio network dimensioning and network equipment fur- 
ther development the effect of these QoS management functions on the overall 
system performance has to be determined. This paper does not aim at optimized 
solutions for QoS management functions, but at simple proposals to be able to 
estimate the performance gain achieved with their introduction compared to a 
pure best-effort service. The focus lies on the radio network and not on the core 
network, since radio resources are scarce and representing the system bottleneck 
assuming that the core network is well dimensioned. 

In the next section the QoS functions in GPRS networks focussing on the ra- 
dio network are introduced. Next the simulation model GPRSim comprising the 
implementation of QoS support is presented. This is the basis for the simulation 
results given in the next section that figure out the performance gain through 
QoS functions in GPRS compared to simulations for a pure best-effort service. 
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2 Quality of Service in GPRS Radio Networks 

From a network operator’s point of view, the introduction of QoS-based traffic 
management offers various advantages. Not only it is possible to utilize network 
resources in a more efficient manner by treating application data flows with 
respect to their actual needs - e.g., fetching an e-mail does not pose the same 
strict requirements on the packet delay as an IP telephone call - but also to 
differentiate between service users with respect to their subscribed QoS. Without 
an efficient QoS management upcoming real-time applications like VoIP or Video 
Conferencing will not even be possible to be supported. 3GPP has specified 
QoS requirements for different service classes namely Conversational, Streaming, 
Interactive and Background. Table^gives an overview of these requirements |^. 

In this paper only the problem of QoS provisioning in the radio network is 
considered. The serving host is assumed to be located in the operator’s domain 
and the core network is assumed to be well dimensioned. 

To define a QoS contract between the mobile station (MS) and the network. 
Packet Data Protocol (PDP) contexts containing QoS profiles are negotiated 
between the MS and the Serving GPRS Support Node (SGSN) The Base 
Station Subsystem (BSS) is provided with a Packet Flow Context (PFC) con- 
taining the Aggregate BSS QoS Profile (ABQP) and is responsible for resource 
allocation on a Temporary Block Flow (TBF) base and scheduling of packet 
data traffic with respect to the according QoS profiles negotiated. Moreover, 
it regularly informs the SGSN about the current load conditions in the radio 
cell. The tasks of the Gateway GPRS Support Node (GGSN) comprise mapping 
of PDP addresses as well as classification of incoming traffic from external net- 
works regarding the downlink Traffic Flow Template (TFT). The GPRS Register 
(GR) holds the QoS-related subscriber information and delivers it on demand 
to the GSN. In Figure Q a GPRS session is schematically outlined depicting the 
instances involved, messages exchanged, and parameters used for PDP context 
(re)negotiation, PFG setup, and TFT installation. 

From a time-scale point of view, the mechanisms for QoS management within 
the GPRS can be regarded as a three-stage model (see Figure E). On PDP con- 
text activation the QoS parameters are negotiated. As long as the PDP context 
remains active, these parameters should be guaranteed unless there is a QoS 
renegotiation. The QoS profile is considered both for each TBF and for each 



Table 1. QoS requirements for selected services belonging to different traffic classes 



Traffic class 


Medium Application 


Data rate One-way delay 


ConversationalAudio 


Telephony 


4-25 kbit/, < 150 ms 




Data 


Telnet 


< 8 kbit/s < 250 ms 


Streaming 


Audio 


Streaming audio (HQ) 


32-128 kbit/s < 10 s 




Video 


One-way 


32-384 kbit/s < 10 s 




Data 


FTP 


< 10s 


Interactive 


Audio 


Voice messaging 


4-13 kbit/s < Is 




Data 


Web-browsing (HTML) 


— <4 ®/page 
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Fig. 1. QoS negotiation and renegotiation procedures (example) 



radio block period. At TBF setup, radio resources like a set of Packet Data 
Channels (PDCH) usable for this TBF are assigned according to the negotiated 
QoS parameters. During the TBF, radio blocks are scheduled at the BSS in 
competition with other existent TBFs in the radio cell. This scheduling has to 
be done considering the QoS profiles of the PDP contexts associated with the 
TBF. 
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1 

1 

1 
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(T) PDP context activation (SGSN, CAC) 

@ Resource aliocation at TBF setup (BSS, RLC/MAC) 

(D Scheduiing of RLC/MAC biocks within a TBF (BSS, RLC/MAC) 



Fig. 2. Three-stage model of QoS management 



3 Simulation 

In this section the simulation tool GPRSim that is the basis for the perfor- 
mance analysis is introduced. This simulation environment allows to ascertain 
and to optimize properties of different protocols of the (E)GPRS transmission 
plane. In addition, it gives the possibility of network capacity and quality of 
service planning by performance evaluation in certain simulation scenarios. The 
GPRS/EGPRS Simulator GPRSim is developed as a pure software solution in 
the programming language G-F-F. Models of Mobile Station (MS), Base Station 
(BS), and Serving GPRS Support Node (SGSN) are implemented. The simula- 
tor offers interfaces to be upgraded by additional modules (see Figure 0 ). For 
implementation of the simulation model in G-l — F the Gommunication Networks 
Glass Library (GNGL) is used, which was developed at the Ghair of Gommu- 
nication Networks. It allows an object oriented structure of programs and is 
especially applicable for event driven simulations. The complex protocols like 
LLG, RLG/MAG based on GPRS/EGPRS Release 99 and the Internet Load 
Generators including TGP/IP and UDP/IP are specified with the Specification 
and Description Language (SDL), translated to G-F-F by the Gode Generator 
SDL2GNGL and finally integrated into the simulator. 

Different from usual approaches to building a simulator, where abstractions 
of functions and protocols are being implemented, the approach of the GPRSim 
is based on the detailed implementation of the standardized protocols. This 
enables a realistic study of an (E)GPRS network. 

The functions not specified in detail in the GPRS specification are the GAG 
policy and the scheduling strategy. The implementation of these components in 
the GPRSim is depicted in the following. 
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Fig. 3. The GPRS/EGPRS Simulator GPRSim 



3.1 Connection Admission Control 

In the simulation model PDP requests are differentiated on subscriber base 
(Premium, Standard, Best-Effort (BE)) and application base (Conversational, 
Streaming, Interactive, Background). In this study only Interactive (WWW) and 
Background (e-mail) are regarded, since these are the applications predicted for 
GPRS in the next years. To avoid a total withdrawal of resources from the Stan- 
dard traffic classes with lower QoS requirements, e.g., other than Conversational, 
there is a share reserved for this kind of traffic from the pool of radio resources 
in the cell. In general, all resources are open to traffic of any kind. In time of 
high load, however, traffic flows with more demanding QoS requirements are al- 
lowed to displace flows belonging to applications with lower QoS requirements, 
but only up to a certain limit (see Figure EJ, where P and I represent the ap- 
propriate limits. When this limit is reached, the requested QoS is not granted, 
but rather degraded to the next-lower-prioritized class. 
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Fig. 4. Admission control policy (example) 



3.2 Scheduling in the BSS 

Depending on the QoS profile negotiated the BS RLC/MAC layer performs 
scheduling of the radio blocks. The scheduling mechanism implemented for both 
uplink and downlink direction follows a three-stage principle (see Figure EJ- 
First, incoming radio blocks are distributed into one of three queues according 
to the QoS subscription associated with the respective traffic flow. It is differenti- 
ated between Premium (“Gold Card”) service, Standard service and Best-effort 
service. The second stage is only valid for Standard service traffic. Owing to 
a packet’s application QoS profile, the appropriate traffic class queue is chosen 
from Conversational, Streaming, Interactive, or Background. Best-effort traffic 
from the first stage is put into a fifth queue. Within the traffic class queues pack- 
ets are scheduled according to their TBF and a Round Robin (RR) algorithm 
with the depth of 20 radio blocks per scheduled TBF in the RR cycle. The third 
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Fig. 5. Principle of the scheduling function located in the BS RLC/MAC layer 



stage is built by a simple priority mechanism, serving the traffic class queues in 
order from highest priority (Premium) to lowest priority (Best-effort). 

4 Performance Evaluation 

4.1 Simulation Scenarios 

The cell configuration is given by the number of transceiver units (TRX) in the 
radio cell. Here a typical 3-TRX scenario is regarded with 0 and 1 fixed and 
8 and 7 on-demand Packet Data Channels (PDCH) that are shared with cir- 
cuit switched GSM traffic, which is offered corresponding to an Erlang-blocking 
probability of 1 %. This means that on average around 7 PDCHs are available 
for GPRS 0. 

The channel conditions are determined by a constant RLC/MAC block error 
probability of 13.5 % corresponding to a C/I of 12 dB. As the coding scheme 
CS-2 is used. 

LLC and RLC/MAC are operating in acknowledged mode. The multislot 
capability is 4 uplink and 1 downlink slots. The MAC protocol instances in the 
simulation model are operating with three random access subchannels per 52- 
frame. LLC has a window size of 16 frames. TCP/IP header compression in 
SNDCP is performed. TCP is operating with a maximum congestion window 
size of 8 Kbyte and a TCP Maximum Segment Size (MSS) of 536 byte. The 
transmission delay in the core network and externel networks, i.e., the public 
Internet, is neglected. This corresponds to a scenario where the server is located 
in the operator’s domain. The session interarrival time is set to 12 seconds. The 
Internet traffic jS| is composed of 70 % E-Mail sessions and 30 % WWW sessions 
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(see Table l2ll not depending on the subscription profile of the regarded MS. 10 % 
of the mobile stations are representing Premium subscribers and 90 % Standard 
subscribers. 

Table 2. Model parameters of Internet applications (HTTP and SMTP) 



HTTP Parameter 


Distribution 


Mean 


Pages per session 


geometric 


5.0 


Intervals between pages 


[s]negative exponential 12.0 


Objects per page 


geometric 


2.5 


Object size [byte] 


log 2 -Erlang-k 


3700 


Amount of SMTP data 


Distribution 


Mean 


E-Mail Size [byte] 


log 2 -normal 


10000 


Base quota [byte] 


constant 


300 



4.2 Performance and System Measures 

As performance measures the downlink IP throughput per user during a data 
transmission and the 95-percentile of the downlink IP packet delay are regarded. 
These are the QoS measures that are noticed by the user and that can be com- 
pared to the ETSI/3GPP QoS classes P,?]. For WWW and e-mail applications 
the throughput per user is the important measure since it mirrors the response 
time of a requested file. 

The system measures comprise the downlink IP system throughput per radio 
cell and the downlink PDCH utilization, which is calculated by the total number 
of radio blocks carrying data or control information divided by the total number 
of transmitted radio blocks. The measures are presented over the number of 
mobile stations (MS) offering GPRS traffic. 



4.3 Simulation Results 

Figure |6(a)| shows the mean downlink IP system throughput per radio cell for 
0 and 1 fixed PDGHs and with and without QoS management functions. The 
difference between the curves with 0 and 1 fixed PDGHs is very small since 
only in 1 % of the time all PDGHs are allocated for circuit-switched calls. Since 
the offerd circuit-switched traffic is lower for the 1-fixed-PDGH scenario, the 
system throughput is 1-4 % higher in the 0-fixed-PDGH scenario. As expected 
the system throughput for low load situations with less than 20 MS in the cell 
are nearly the same for the results for a best effort (BE) service and a service 
with QoS functions. In higher load situations the system throughput comes into 
saturation. This can be explained by the effect that 5 % of the Background 
sessions are terminated, when no IP packets are received for a period of more 
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Fig. 6. System measures with and without QoS management functions 
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user packet delay 



Fig. 7. Performance measures for different subscriber and application classes 



than 30 seconds. This does not occurr in the BE simulations. The same effect 
can be seen in Figure |6(b)| where the channel is not utilized with more than 
75 % in the results with QoS functions. 
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In Figure 7(a) the downlink IP throughput per user during transmission pe- 
riods for the different service and subscriber classes Premium, Interactive and 
Background compared with simulation results for a pure BE service (without 
CAC) is presented. In situations with low traffic load Standard users are los- 
ing 15-20 % of performance compared to the BE service while the Premium 
user performance always remains higher than 15 higher load situations 

the service differentiation between Interactive and Background becomes visible. 
While the throughput performance of the Interactive traffic does not fall under 
10 kbit/s^ the performance loss for Background applications is not visible in this 
measure. Nevertheless, 5 % of the Background sessions are terminated because of 
poor performance as mentioned above. This can be avoided using fairer schedul- 
ing algorithms. The 95-percentile of the IP packet delay in Figure 7(b) shows 
the similar effect. 



5 Conclusions 

In this paper the capacity and performance gain achieved by quality of service 
functions in the GPRS radio network comprising Connection Admission Control 
(CAC) and scheduling with subscriber and service differentiation is examined. 
Simulation results show that Premium users can be served with nearly constant 
throughput and delay performance even if the number of active mobile stations 
in the radio cell rises to 40. 40 Interactive applications instead of 12 in the pure 
best effort case can be served with a throughput performance of 10 kbit/g^ while 
the performance for Background users remains acceptable even in high load situ- 
ations. These results show that QoS functions in GPRS networks are increasing 
the application-specific performance significantly and realize the capability to 
serve subscribers and applications with respect to their QoS requirements. 
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Abstract. The aim of this paper is: (a) to define suitable traffic source models 
for representing various types of data services/applications and (b) to study the 
performance resulting from the introduction of GPRS services in legacy infra- 
structures, through a generic simulation platform which takes into account air- 
interface, access and backbone network configuration aspects. The performance 
will be studied for a range of voice traffic loads, number of TRXs, offered ser- 
vices, number of dedicated PDCHs, etc. 



1 Introduction 

Data services like Web broswsing, e-mail and file transfer are becoming more and 
more popular in cellular systems. Up to now, GSMQdata transfer has been Circuit- 
Switched (CS) i.e., physical resources are allocated to a user for the entire service 
session. However, this is insufficient in case of bursty traffic, where bursts are sepa- 
rated by long intervals of inactivity. This has been the main reason for the introduc- 
tion of General Packet Radio Service (GPRS), which on the one hand acts as a mobile 
access network to the Internet, while on the other hand it enables the operator to offer 
a wide variety of value-added services efficiently (WAP over GPRS, e-banking, e- 
commerce, push services, etc.) [1]. The aim of this paper is two-fold: (a) to define 
suitable traffic source models for representing various types of data ser- 
vices/applications and (b) to study the performance resulting from the introduction of 
GPRS services in legacy infrastructures, through a generic simulation platform which 
takes into account air-interface, access and backbone network configuration aspects. 
The performance will be studied for a range of voice traffic loads, number of TRXs, 
offered services, number of dedicated Packet Data Channels (PDCHs), etc. 
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2 Simulation Platform 

A generic GPRS simulation platform composed of four components (Fig. 1) has been 

developed [2]: 

1. GPRS Environment Representation (GER): Comprises libraries with traffic 
source models representing the service session behavior in terms of session dura- 
tion, packet calls/session in uplink/downlink, QoS (i.e., throughput, bitrate, session 
dropping, call blocking) in conjunction with user behavioral characteristics. 

2. GPRS Network Representation (GNR); It comprises models that correspond to 
GPRS network elements (i.e. SGSN, GGSN, BG, CG), while the GSM traditional 
elements (BTSs, BSCs, MSC/VLRs, HER, etc.) incorporate GPRS related func- 
tionality (i.e., dynamic allocation of radio resources to CS and PS traffic). 

3. GPRS Simulator Event Scheduler (GSES): It provides the means for represent- 
ing, storing, and manipulating (inserting, retrieving, deleting, etc.) the events that 
the core of the simulator will have to handle. 

4. GPRS Simulator Controller (GSC); It is responsible for the initialization of 
GNR, GER and GSES and handles the communication between them. 




Fig. 1. GSM/GPRS Simulation Platform 



3 Traffic Source Models 

The definition of a generic source model “representing” the service classes as defined 
in [3] (conversational, streaming, interactive^ and background^, is very difficult. 
Since our studies focus on GPRS, only interactive and background service classes 
have been considered. 



3.1 Interactive Services 

A data session [4] consists of a sequence of packet calls, while a packet call is com- 
posed of a bursty sequence of datagrams (Pig. 2). 



^ Interactive services are typical instant response request services (i.e., web browsing, data base retrieval, 
server access, polling for measurement records and automatic data base enquiries). 

^ In background services the end-user sends and receives data in the background (i.e., background deliv- 
ery of e-mails, download of databases). 
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A Data Session 




Fig. 2. Typical Chai'acteristics of an Example Interactive Service 

A detailed source model for interactive services is depicted in Fig. 3. As shown, 

four states can be identified; 

1. Active State (user active, host inactive) : The MS sends a “packet uplink channel 
request”. If a collision has occurred, the MS retries using backoff. On no collision, 
the MS is assigned the uplink PDCHs that will be used for uplink transmission. 

2. Waiting State (user inactive, host inactive) : After the successful transmission of the 
uplink request, the MS waits for the assignment of the downlink PDCHs, through 
which the requested information will be transmitted. 

3. Receiving Information State (user inactive, host active) : A “packet downlink as- 
signment message” is sent to the MS comprising the assigned downlink PDCHs. 

4. Reading State (user inactive, host inactive) : Represents the time needed for proc- 
essing the downloaded information before making the new request. 

5. 




Fig. 3. Traffic Source Model for Interactive Services 



3.2 Background Services 

The uplink transmission for background upload services is similar to the interactive 
services one. The difference is that the MS performs the “packet uplink channel re- 
quest” only once and the uplink transmission is performed through the assigned 
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PDCHs. In case of download background services, after the successful uplink request 
transmission, the MS receives a “packet downlink assignment” message and starts 
monitoring the assigned downlink PDCHs to identify the packets addressed to it. 



3.3 Source Models Parameters 

Table 1 illustrates the traffic source models parameters indicating in which type of 
service these are applicable. 



Table 1. Traffic Source Models Parameters 



Services Characteris- 
tics 


Description 


Interac- 

tive 


Download 

Back 

ground 


Upload 

Back 

ground 


#packet calls/session 
(Npc) 


Exponentially distributed with a mean |i,Npc 


D 


✓ 


B 


#datagrams within a 
packet call (A^^) 


Follows Pareto distribution with cut-off 


✓ 


✓ 


B 


Datagrams interarrival 
time (Dd) 


Exponentially distributed with a mean |i,Dd 


D 


✓ 


B 


Datagram size, (Sd) 


Uniformly distributed variable 


✓ 


✓ 


✓ 


GPRS backbone and 
Internet delay (Dgi) 


Exponentially distributed with a mean 
|i.DGT. Represents the time interval between 
a successful transmission of an uplink 
request and the arrival of the first datagram 
of the downlink packet call 


✓ 


✓ 


N/A 


Reading time (Dpc) 


Exponentially distributed with a mean |i,Dpe 


✓ 


N/A 


N/A 


User Behavior 


User tolerance time (Tut) 


Constant variable; models user behavior. 


✓ 


N/A 


N/A 


Max. #user retries (Nur) 


Max. #user retries when user tolerance time 
has expired 




N/A 


N/A 



4 Case Studies 

1. CASE STUDY 1: Estimation of the maximum CAR for each offered data service 
for a range of voice traffic loads, while maintaining the QoS of the CS and PS 
connections under tolerable levels. 

2. CASE STUDY 2: Estimation of the required number of TRXs in a cell, given 
specific operator's requirements (CARs, services characteristics, etc.). 

3. CASE STUDY 3; Investigation of the impact of dedicated PDCHs on the overall 
system performance. 
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5 Results 

The performance of a GPRS-capable network under various loading conditions and 
for various data services is investigated. The system performance is characterized by 
the max. CAR/data service that the system can support (for specific data services 
characteristics, voice traffic and TRX availability) provided that the QoS criteria are 
respected. Note that: (a) for interactive services abnormal session release (session 
dropping) occurs whenever the maximum number of user retries has been reached, 
while (b) for background services, abnormal session release may be initiated (either 
by the system or the user) whenever the “negotiated QoS” falls below certain limits 
e.g., the offered mean bitrate falls below a certain percentage (70%) and remains 
there for a certain time period (20 sec). 



5.1 Case Study 1 

5.1.1 Input Data 

1 
4 

30 

55 sec 
Table 2 
1% 

1% 

2% 

60-1320 calls/h 
CS1-CS2 



Table 2. Services Characteristics 



SERVICES 

CHARACTERISTI 

CS 


Web- 

browsing 

Uplink 


Web- 

browsing 

Downlink 


Download e- 
mail 


Upload e- 
mail 


Npc 


30 


30 


8 


8 


Nd 


5 


40 


20 


20 


Sd (bytes) 


200-500 


200-700 


300-700 


300-700 


Dpc (sec) 


N/A 


40 


N/A 


N/A 


Dgt (msec) 


40 


N/A 


40 


N/A 


Dd (sec) 


0.001 


0.001 


0.001 


0.001 


Max & Min PDCH 
up- and downlink 


1 


Min:l-Max:3 


Min:l-Max:2 


Min:l-Max:2 


USER BEHAVIOR 


Tut (sec) 


120 


N/A 


N/A 


N/A 


Nur 


3 


N/A 


N/A 


N/A 



ffceiis 

#TRXs/cell 

#Traffic Channels (CS and PS) 
Mean call duration 
Services characteristics 
Voice call blocking probability 
Session blocking probability 
Session dropping probability 
Voice CAR 
Codine Schemes used 
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5.1.2 Results 

The results concern: 

1. The maximum CAR of the data services that the system can withstand for a range 
of voice CARs. The results indicate that the CAR for data sessions decreases, as 
voice load increases. We also observe that e-mail CAR (upload and download) is 
higher than the Web-browsing one. This is justified by the fact that the e-mail is 
less demanding than Web-browsing since: (a) it requires less slots and (b) each 
session consists of a low number of packet calls, leading thus to shorter sessions 
durations. Moreover, the upload e-mail CAR is higher than that of download e- 
mail, as the downlink is dominated by Web-browsing downlink traffic and 
download e-mail. 




Fig. 4. Max. Supported CAR for Offered Data Services versus Voice CAR 

2. The average number of slots allocated to data services and the mean bitrate for 
various voice CARs. According to the obtained results, the requested PDCHs for 
both uplink and downlink are actually assigned for medium and high voice traffic 
loads, while for low voice traffic (where the number of data sessions is high) the 
above is not always satisfied (downlink Web-browsing, download e-mail). Obvi- 
ously the latter affects the services mean bitrate. 

3. The percentage of slot utilization in uplink and downlink for both GSM and GPRS 
services show that the maximum achievable slot utilization for voice only service 
(GSM) is 60.9% for both uplink and downlink, while for a combined GSM/GPRS 
network, the maximum achievable slot utilization in uplink and downlink is 
63.95% and 77.78% respectively. The difference in uplink and downlink slot utili- 
zation is justified by the asymmetric nature of the assumed services. 
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Fig. 5. Mean Bitrate (kbits/sec) for Various Voice CAR (Calls/h) 




Fig. 6. Slot Utilization in Up-Downlink for GSM/GPRS Services for Various Voice CARs 
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5.2 Case Study 2 

5.2.1 Input Data 

The same as in case study 1 . 

Operator’ s specific data: 

Voice traffic load 
Web-browsing CAR 
Download e-mail CAR 
Upload e-mail CAR 

%GSM subscribers that are GPRS subscribers too 
%GPRS attached subscribers during busy hour 

%GPRS attached subscribers having an active PDP context during busy 
hour 



: 13.75 Erlangs 
: 0.45 calls/sub/h 
: 1.5 calls/sub/h 
: 2.5 calls/sub/h 
: 15% 

: 75% 

: 95% 



5.2.2 Results 

The results show that 4 TRXs are not enough for satisfying operator's requirements 
(especially for Web-browsing), while 5 TRXs are more than enough as in this case 
the maximum supported Web-browsing CAR is 0.733 calls/sub/h. 




10 12 14 16 18 20 

Voice Traffic Load (Erl) 



Fig. 7. Max. Supported CAR for Offered Data Services/Subscriber/Hour versus Voice Traffic 



5.3 Case Study 3 
5.3.1 Input Data 

— #Dedicated PDCHs: 0-4 

— Voice CAR=1200 calls/h 

— The rest is as in case study 1 . 
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5.3.2 Results 

We observe that as the number of dedicated PDCHs increases, both the CAR of data 
sessions and the voice call blocking probability increase. The data CAR increase is 
justified by the fact that at high voice traffic load, packet calls can still be supported 
efficiently due to the existence of dedicated PDCHs. The increase of the voice call 
blocking probability above 1 % when assuming 3 and 4 dedicated PDCHs, is justified 
by the fact that the traffic channels that could be utilized by CS (voice) connections 
(30-3=27 and 30-4=26 respectively) cannot satisfy the criterion of 1% blocking (ac- 
cording to Erlang-B formula) for the specific voice CAR (1200 calls/h). 




Fig. 8. Max. Supported CAR for Offered Data Services versus #Dedicated PDCHs 



6 Conclusions 

In this paper we have investigated the performance of a combined GSM/GPRS capa- 
ble network for a range of voice traffic loads, number of TRXs, offered services (both 
interactive & background ones), number of dedicated PDCHs, etc. The results have 
shown that: (a) Legacy cellular systems can withstand a significant volume of PS data 
traffic, (b) The introduction of GPRS leads to better slot (radio resource) utilization 
compared to a pure GSM network and (c) The use of dedicated PDCHs improves the 
quality levels provided to data sessions, but at the same time, at high voice traffic 
loads the call blocking probability may increase above tolerable levels (1%). 
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Abstract. The enormous development of wireless multimedia commu- 
nication techniques requires new modeling methods. In this paper a novel 
analytical technique is presented, to examine wireless networks with mul- 
tiple connection types that may change their generated traffic in time. 
By our method call-level system parameters can be calculated. We also 
propose two simple base station admission control policies and investi- 
gate the effect of these policies and the effect of reserved capacity for 
several connection types. 



1 Introduction 

One of the major directions of today’s telecommunications research is the in- 
tegration of high speed data communications with wireless access. Considering 
the rapidly increasing computational capacity of portable computers, the pos- 
sibility of using broadband multimedia applications is already given in mobile 
nodes. However, wireless networks are currently not capable of providing high 
data rates for such applications. 

The evolution of of second generation mobile systems towards the third gen- 
eration (UMTS, IMT2000 family) assures that packet switched data and multi- 
media communications are beginning to spread instead of voice-oriented mobile 
systems. Yet the transmission speed of wired LANs will not be available in 3G 
systems. 

Besides the development of 3G systems enormous research is being carried in 
the field of wireless LANs and other broadband cellular networks. These systems 
are not intended to provide wide range coverage, rather the aim of these is 
similar to that of wired LANs. These radio networks (e.g. Hiperlan 2, broadband 
extension of 802.11, wireless ATM LANs) are capable of providing several tens of 
Mbps aggregated transmission speed. In such wireless networks any application 
can run. These applications often do not generate traffic at constant speed, thus 
treating them as constant bit-rate sources is not appropriate. 

As we have seen both the industry of commercial cellular systems and wire- 
less LANs move towards the direction of carrying heterogeneous, multimedia 
traffic with reasonable bandwidth. For preliminary network design and network 
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dimensioning analytical models are needed that describe the behavior of the 
system considering connections that generate variable amount of traffic in time. 
To use the radio spectrum efficiently wireless MAC protocols are needed that 
allocate to a user radio resources according to its instantaneous need. 

Recently a number of papers proposing models of cellular networks with mul- 
tiple service classes were published (e.g. P-0 and references therein). Despite 
the anticipatory significance of wireless multimedia communications these ana- 
lytical models of wireless networks do not take into account the versatility of a 
customer’s occupied capacity. Even in the case when multimedia connections are 
considered, their required capacity is characterized by the - constant - effective 
bandwidth. 

In most of the literature a customer is characterized by three time variables. 
One of these is the dwell time which is the time a mobile spends within a radio 
cell while it is roaming throughout the area. The session length is the time a 
connection lasts regardless the terminal’s mobility. This means that the above 
two variables are independent. The third variable is the channel holding time, 
that a connection spends actively in a cell. This is usually calculated from the 
first two. 

Both the session length and the dwell time is often modeled with exponential 
distribution, although this assumption is only valid under specific circumstances. 
Recently more sophisticated dwell time and session length distributions were 
suggested (e.g. 0 - 0 ). 

In this paper we propose a modeling framework for calculating performance 
parameters of cellular networks when customers are present that change their 
amount of occupied capacity during a connection. The model is based on a 
Markovian description of a single base station considered as a pool of a given 
amount of shared capacity. We present an approximate method that is appro- 
priate for the calculation of new call blocking probability, handover failure prob- 
ability and channel utilization. 

This paper is organised as follows. In Section 0 and 0 the general description 
of the modelling environment and the derivation of the service time distribution 
is given. In Section 2]and0 the abstract model of the base station and its solution 
method is investigated. This is followed by numerical results and conclusions. 

2 Mobile User Model 

The method presented here focuses on a single base station as if it were func- 
tioning in isolation. Sessions can admit to the base station as new calls initiated 
within the coverage area of the base station (referred as new connection), or as a 
handover attempt from the vicinity of the cell (handover call) . We suppose that 
the arrival of new calls and handover calls are appropriately described as two 
Poisson processes. 

We assume that there are K customer types present in the system. Customers 
belonging to different types may belong to different service classes, in this case 
the characteristics of their generated traffic is different as well as probably their 



298 



P. Fazekas and S. Imre 



session lengths. Customers that belong to the same service class may belong 
to different types in the case when their mobility behavior is different, namely 
when their dwell times are different. As we show in the next subsection we must 
distinguish between handover calls and new calls as well. In this paper the super-, 
or subscript H means handower call, while N denotes new call. 

2.1 Residual Dwell Time and Session Length 

Let us denote the dwell time and the session duration of a type k customer by 

and respectively. When examining a single base station, the knowledge 
of the dwell time and the session duration of each user is not sufficient. Rather 
we introduce the notion of residual dwell time and the residual session length 
to describe the difference between handover calls and new connections, both 
are derivatives of the dwell time and the session duration. The residual session 
length of handover connection of class k is denoted by g this is the period 
between the call’s admission to the base station and the termination of the call, 
regardless the place of call termination. For connections initiated within the cell 
it obviously follows from the definition that the residual session length is equal 
to the session length, i.e. s n ~ This also holds if the session length is 
exponentially distributed, due to the memoryless property of the exponential 
distribution. The residual dwell time of a type k connection that is initiated in 
the cell is denoted by ^ This is the time interval between the time instant 
the customer starts transmission in the cell and leaves the cell, regardless it 
has terminated its call or not. It is again obvious from the definition, that for 
handover calls this residual dwell time equals to the dwell time and this equation 
holds for new calls as well if the dwell time is exponentially distributed. 

The derivation of the residual session length and dwell time is completely 
out of the scope of this paper, therefore we suppose that the distributions of the 
four descriptor times of each customer types or at least statistics of these times 
are given. Our analysis requires each descriptor distribution to be approximated 
by Phase Type distributions (PH). A phase type distributed time is a mixture 
of a number of exponentially distributed phases 0. In other terms, the PH 
is the time a finite state Markov chain reaches an absorbing state. The PH is 
characterized by the initial probability of the Markov chain, the rates among 
states and the rates between each state and the absorbing state. 

Studies show (e.g. ^ and references therein) that with an appropriately cho- 
sen PH any distribution can be approximated with arbitrary accuracy. Moreover, 
if a series of statistical data of a random variable is available, its (unknown) dis- 
tribution can be accurately approximated with a proper PH. 

To approximate user describing times with PH has the advantage that while it 
enables arbitrary distributions to characterize customer behavior it still exploits 
the memoryless nature of the exponential distribution (the phases of the PH last 
for exponentially distributed time), thus standard queuing analysis techniques 
can be used to examine the system at the expense of larger state space. 

The versatility of a customer’s generated data is fulfilled by means of a finite 
Markov chain. Each state of the chain is assigned with a given transmission 
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rate. This means that the active user transmits with a certain bit-rate for an 
exponentially distributed time, than it changes its transmission speed according 
to the describing Markov chain. This is a widely accepted model of VBR traffic 
and with appropriately chosen Markov chain the generated traffic of a VBR 
connection can be characterized correctly. The probability that a connection 
starts transmission with a given rate is described by the initial probability vector 
of the Markov chain. In this paper the term variable bit-rate and the abbreviation 
VBR does not mean the VBR class of ATM, but simply a connection type that 
changes its transmission speed during a connection. Naturally constant bit-rate 
connection types may also be present in the system, the model is applicable for 
such types without any modification. 

Because a handover connection is already active when arriving, its initial 
probability vector of the VBR describing Markov chain is not the same as for a 
new call. It is reasonable to suppose that before handover the session was active 
for enough time so the Markov chain reaches the equilibrium. Thus the initial 
probability of the VBR Markov chain of handover calls is supposed to be equal 
with the steady-state distribution of the chain. 

2.2 Base Station Model and Call Admission 

The base station is modeled as a channel pool of Co units of capacity. In general 
the blocking of a handover attempt is less tolerable then blocking a new call 
initiation. Furthermore we suppose that there are some connection types that 
tolerate call blocking less than others. Therefore the base station shares its ca- 
pacity among connection types not equally: the available capacity of a type k 
handover call is denoted by that of type k new call is . 

In this paper we investigate two simple base station admission control policies 
that can be applied if there is not enough capacity to admit the connection with 
its actual transmission speed: 

— policy 1: the call with too high instantaneous transmission speed is immedi- 
ately blocked, 

— policy 2: the connection is not blocked, but the base station forces it to 
begin its transmission with a lower rate. The call begins transmitting with 
the highest among its possible rates that can be admitted to the base station. 
The call is only refused when the available capacity at the base station is 
less than the lowest possible transmission rate of the connection. Although 
applying this policy the connection attempt is not immediately refused, but 
since it has to transmit with a lower rate that it intended, the call may 
suffer some degradation of its QoS parameters. For each connection type 
and handover and new connection any of the two policies may be applied. 

Due to the existence of VBR connections, the amount of occupied capacity 
may change without the initiation of new or handover call or call termination. 
In the case when a mobile tries to switch to a higher bit-rate but there is not 
enough free capacity to do this, it is reasonable to assume that the call is not 
dropped but the user is able to continue its transmission with the previous rate. 
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3 Service Time Distribution 

The aim is to derive a queueing model of the presented system. The input process 
is assumed to be Poisson and a service time distribution is needed to create a 
Markov chain description of a single base station. 

The service time is the time a connection spends in the system communicat- 
ing, i.e. its distribution is equal to the distribution of the channel holding time. 
From the definitions of the residual dwell time and session length, it is clear that 
the channel holding time is the following: 

TC,H = '''£>)> TC,N = ''"I) (1) 

where Tq ^ is the channel holding time for a type k customer that arrived with 
handover and Tq ^ is the channel holding time for type k calls initiated in the 
cell. 

Let the descriptors of the PH distributed dwell time of type k users be de- 
noted by and where = {D^j} contains the rate from phase i to 

phase j in its i,j -th position (nor i, nor j is the absorbing state), the ith element 
of is the probability that the PH random time begins with phase i and the 
vector contains the rates from each phase to the absorbing state (finish) 
of the process. According to the properties of Markov chains, D^j- 

Similarly, the parameters of are and , that of tr^s,h are ,1^’^ 

and and Tfi^D,N is characterized by and , 

The channel holding time of a new call can be composed from the PH dis- 
tributed and Tg as follows. The phases of the session duration time is 

taken as many times as many phases the residual dwell time has, with the rates 
of among the phases within a group. Between the appropriate states of these 
groups the rate is equal to the rates between the corresponding phases of the 
residual dwell time. 

To track the occupied capacity at the base station accurately the service 
time must contain information on the actual bandwidth requirement of the con- 
nection. To achieve this the service time is composed from the channel holding 
time and the VBR describing Markov chain the same way as the channel holding 
time was composed. During this procedure the role of dwell time is replaced with 
the VBR describing Markov chain and the role of call length with the channel 
occupancy distribution. The rate matrix of the VBR Markov chain is denoted 
by the number of possible transmission rates is and its initial vector is 
yk,H j^andover calls and for connections initiated in the cell. 

The service time composed the described way also has a PH distribution, 
with the descriptors: 

where © and © denotes the Kronecker sum and product, respectively. 

It is easy to show that the distribution of the PH composed in the described 
manner is the same as the distribution of rnin{T^ jq , Tg) , i.e. this is indeed 
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the channel holding time distribution and it also contains the instantaneous 
transmission rate of the connection. 

The service time for handover calls can be composed analogously. Since we 
assume that an call does not change its type during transmission, the 2K service 
times generated (one for handover and newly generated calls of all K types) can 
be handled independently. 

4 Blocking Probabilities 

We showed that the service time of this system with Cq servers has a PH dis- 
tribution given by O. Since the input process is Poisson formally we have an 
M/PH/Cq queue with phase dependent capacity requirements. 

The state of the describing process is the vector n = . . . , , . . . , 

where denotes the number of type k users arrived with handover 
receiving phase i of their service. Let denote the vector containing the 
capacity demands associated with the phases of the service time of a type k 
connection that is initiated in the cell. The valid states of the system are those, 
where for any k: 



This simply means that the total amount of occupied capacity can not be 
larger than the maximum capacity and each connection type can occupy less 
then its available capacity. 

In m we described the state space and the driving process of the proposed 
system. We also pointed out that due to the multiple dimensions of the state 
space the resulting Markov chain may have several millions of states, therefore 
calculating its steady state distribution as the solution of the pQ = 0 global 
balance equations is impossible. 

To calculate blocking probabilities and channel utilization the steady state 
distribution of the system is not required. Rather the channel occupancy prob- 
abilities are needed, i.e. p(m) = Pr{m units of capacity is occupied in the base 
station}. 

Given the channel occupancy distribution, the performance parameters of 
the system can be calculated as follows. The call blocking probability in case of 
applying policy one for a type k call initiated in the cell is: 



K 





( 4 ) 



where denotes the number of possible transmission speeds of a type k cus- 
tomer and Ofc is the probability that an arriving connection is of type k. The 
same measure for handover calls is calculated analogously. 
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If we denote the minimum possible capacity requirement of a type k call with 
the call blocking probability for a type k call initiated in the cell applying 
the second service policy has the form of: 

Co 

PB = <^k- 

The channel utilization is simply given as: 

Co 

g = m ■ p{m). (6) 

m— 0 

5 Approximate Channel Occnpancy Calcnlation 

If we assume the base station capacity Cq to be infinitely large, the Markov 
chain of the problem has the nice property that its equilibrium distribution has 
a product form. This means that each state’s probability can be calculated as 
the product of one of its neighbor’s probability and a given factor. Moreover, if 
all the capacity demands were the same (CBR connections), this property would 
hold even in case of finite base station capacity. 

The Markov chain has a product form solution if and only if local balance 
equations apply throughout the whole state space. We observed, that in our 
problem at the majority of the state space the local balance equations hold. The 
local balance equations change in those states that represent an amount of total 
occupied capacity that is too big, so some connections can not be admitted or the 
raise of transmission speed is not possible. We refer to these states as the states of 
a blocking sub-space. In these blocking sub-spaces local balance equations also 
hold, but have different form comparing to the equations of the non-blocking 
space. Therefore we conclude that the form of local balance equations depends 
on the total occupied capacity of the base station, which is a key observation 
for further analysis. This means that the multiplying factors that appear in the 
product form solution also depend on the occupied capacity. 

Kaufman |S| and Roberts UDI proposed a recursive formula to compute chan- 
nel occupancy distribution in a shared channel. They considered connections 
with constant capacity requirements. Their method provides exact values in case 
of a product form Markov chain. Their method is based on the mapping of the 
state space into a one dimensional space and they use the multiplying factors in 
their recursive formula to calculate channel occupancy probabilities. 

By examining base stations with infinite capacity we realized, that at the 
non-blocking sub-space the rates into a phase of the service time due to ar- 
rivals and transmission rate changes hold the local balance with the rates out 
of the phase due to call termination or again transmission rate change. Since 
no transition is possible between phases of service time distributions of differ- 
ent connection types, it is enough to formulate the local balance equations for 
a type k connection that was initiated within the cell, for new calls and other 
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type calls the local balance equations are formulated analogously. The number 
of customers receiving different phases of the service time is described by , 
but for sake of simplicity in this derivation we denote this with rr, the local 
balance equations has the form of: 

• (J + 1) = 

Pin’ + e,)(* + 1) ■ S^f) . (7) 

Since the service time has a PH distribution 

I LJ It 

Writing the equations into vector form, we get: 

-XNaks'^’^pin') = 

{{n\ + l)p(n’ + ^) • • • (Ui + l)p(n’ + ^) • • • (n'p + l)p(n’ + , (8) 

where for the sake of simplicity, the number of phases of the service time ■ 

L^’N . yfc) ig denoted by P. Introducing the vector 

pfe.AT ^ ( jn'i + l)pjrv + gi) jn'p + l)p(n- + gp) 

- V pin’) ’■■■ pin’) 

we have 

= -XM-ak’S^'^ ■iS'^’^)-^. (9) 

The vector defined by (Q plays the role of the multiplying factor between the 
probabilities of two neighboring states. 

In this approach a blocking state can be viewed as if the service time was 
changed. This means that since in a blocking states there are some “unreachable” 
phases of the service time, thus blocking states result in the change of the service 
time descriptors, and , I.e those elements of T that represent an illegal 
phase transition because of lack of capacity are set to 0 and the diagonal elements 
of updated such that Ej = 0. If admission policy 1 is 

applied, the arrival of calls with a capacity demand higher than the available 
bandwidth is restricted, thus those elements of are set to 0 as well. 

Thus from this point of view the descriptors of the service time distribution 
depend on the occupied capacity x = n- c. Thus the change of service times can 
be formulated as: 




S'lfix) = 0, = 0 Vj : >C^-x 

If policy 2 is applied, the service time changes as follows: 



(10) 



4’^(x) = 0, >C^ -X 



k,N 



( 11 ) 



and if is the maximum capacity demand such that — x then 



k.N 



k,N ( \ 
[X) = 



k.N I k.N frK\ 

S, +«z (0)- 



— X 



(12) 



304 



P. Fazekas and S. Imre 



Then the factor of (jHl) depends on the occupied capacity as well, therefore 
gets the general form of 

( 13 ) 



Given these equations supposing local balance, the approximate numerical calcu- 
lation of the blocking probability follows the pattern proposed by Kaufman and 
Roberts 0,mi!. Define p{m) and v{p), the relative and the normalized probabil- 
ity of that m amount of capacity is occupied in equilibrium. p{m) is computed 
as p{m) = 0 for m < 0, p(0) = 1, and for m > 0 



k,N 



P(m) = Eti E.Pim - ~m + 



k,N^ c 

p[m-c^' 



-f: 



k,H f k,H\ 



(14) 



and 



p{m) = p{m) 



1 

E^=o:pM 



(15) 



Finally, the blocking probabilities and the channel utilization are obtained as 

®- 0 . 

We intuitively feel, that this method has better accuracy if the system is 
under light load conditions, meaning that the blocking sub-spaces has negligible 
probabilities compared to the non-blocking space. 



6 Numerical Results 

6.1 Accuracy of the Proposed Approximation 

As we described previously we expect that our method gives more accurate 
results under light load conditions and its accuracy deteriorates if the offered 
traffic increases. To give insight of this dependency on the load, we examined a 
base station with Cq = 3.2 Mbps capacity. The VBR nature of the connections 
was fulfilled by means of three possible transmission rates: 32 kbps, 64 kbps 
and 128 kbps. An amount of 320kbps of capacity was reserved for handover 
connections. In such a system a total arrival rate of 15 calls/minute resulted in 
a highly overloaded system. 

We compared the approximate method with computer simulations. The in- 
coming rate of calls initiated in the cell was set to 4 per minute and we increased 
the arrival rate of handover calls. The measure of accuracy was the cumulative er- 
ror that is calculated as: \Psim{m) - Papp{m)\,wh&ce Psim{m) and Papp{m) 

are the probability of having m capacity occupied obtained by simulation and 
the approximate method, respectively. 

We observed, that as the load of the base station increased, the error of 
the approximation increased rapidly, although even in case of very high arrival 
rate the cumulative error was only about 0.1! . Under light load conditions the 
error of the approximation affected only the third or fourth decimal value of the 
occupancy probabilities. 
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By this confirmation of the accuracy of our method the following results were 
obtained by the approximate method only, with the knowledge that the examined 
system is always not heavily loaded, therefore the results have negligible error. 

6.2 Effect of the Reserved Capacity and the Base Station Policy 

The following results were achieved in a system with 20 Mbps channel capacity. 
Three types of connections were supposed: voice calls with 32 kbps transmission 
speed, type 1 multimedia connections with three possible transmission rates 
(64 kbps lowest rate, 256 kbps average and 512 kbps peak rate) and type 2 
multimedia calls (128 kbps lowest, 256 kbps average and 386 kbps peak rate). 

To examine the effect of available capacity for different user types we lowered 
the amount of available capacity for type 1 new calls, while the available capacity 
for new voice calls was 19.6 Mbps. Figure [D shows the blocking probabilities of 
all connection types as the amount of available capacity for insensitive new calls 
decreases. The left graph shows the results if for all connection types policy 1, 
i.e. immediate blocking was applied, the right side of the figure shows the case 
when for type 2 handover calls policy 2 was applied. 

The available capacity for blocking insensitive connection is given as the pro- 
portion of the total capacity. As we decrease the available capacity for insensitive 
new calls its blocking probability dramatically increases, however it results in a 
slight decline of the blocking probabilities of all other types. As it is clear from 
the graph applying policy 2 for type 2 handover calls results in a 0.01 decline of 
the blocking probability for this type. 




ayailadecap.fortypel ^EW ayalsblecsp.fortyp1 ^BAl 



Fig. 1. Blocking probabilities 
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Abstract. To provide cellular data services with differentiated QoS in a high 
data error-rate environment, a X-WFQ scheduling discipline, based on the WFQ 
mechanism and LaGrange A,-calculus was developed. The air resources are 
allocated using A-calculus and then the WFQ mechanism is responsible for the 
transmission scheduling. X-WFQ discipline compensated for the penalty derived 
from the location-dependent errors using the equivalent efficiency concept. This 
discipline can generate a fair schedule for a diverse mix of traffic with diverse 
QoS requirements in a limited radio spectrum. The experimental results show 
that as much as 5% improvement in the mean acceptance rate is obtained 
relative to other existing WFQ-based schemes at the expense of a small 
blocking performance. 



1 Introduction 

Cellular Communications have become the fastest growing segment of the 
communications industry over the past decade. The objective of a wireless cellular 
data communications is to provide to users radio access to services comparable to 
those currently offered by the fixed infrastructure, resulting in a seamless 
convergence of both fixed and mobile services [1,2]. The introduction of 3G 
broadband properties has contributed to the increase in cellular data communications 
[3]. Multimedia applications are expected to become widespread on 3G systems. 
Because air interfaces create a variety of problems due to interferences, Quality of 
Service (QoS) issues have appeared [4,5]. 

To support a diverse mix of traffic with diverse QoS requirements in 3G systems, 
we need a perfect mechanism to schedule the traffic resources to meet the QoS 
requirements. Based on the WFQ mechanism and LaGrange A.-Calculus, we exploited 
a novel scheduling discipline, called X-WFQ, at the transmitter-end to ensure high 
QoS [6,7]. The resources are allocated using X-Calculus and then the WFQ 
mechanism is responsible for the transmission scheduling. An acceptance indication. 
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AI, is referred to as the QoS index. Based on the traffic class and residual resource, 
the X-WFQ mechanism can generate a fair and dynamic schedule to guarantee QoS. 

While packet scheduling for wired links is a maturing area, the scheduling of 
wireless cellular data links is not a well-estahlished science or even a stable craft. A 
fundamental difference between wired and wireless links is that wireless media can 
exhibit substantial radio link error rates, resulting in significant loss of link capacity. 
In a W-CDMA simulation, we have shown that by using a successive interference 
cancellation (SIC) scheme, one can effectively estimate and cancel a co-channel 
interference and thus substantially reduce the near-far effect [8]. The average Bit- 
Error-Rate (BER) in an AWGN channel is shown in Figure 1. However, all flows 
experiencing an error rate do not receive their expected QoS. 

To support QoS guarantees for multimedia services in cellular data networks with 
location-dependent errors, the resource scheduling scheme must take the penalty 
derived from the errors into considerations. However, the errors are unknown and 
non-deterministic in advance, the resources will be wasted if too many resources are 
reserved [9]. Based on the WFQ mechanism, LaGrange A-calculus, and equivalent 
efficiency concept, a X-WFQ scheduling discipline was developed to commit cellular 
data services with differentiated QoS [10]. 




Fig. 1. Comparison of BER with and without SIC in AWGN channel 



2 Background Knowledge 

The existing architectures, scheduling module, and objective functions relative to this 
research are described in the following. 



2.1 Cellular Service Architecture 

Cellular service architecture usually depends on a wireless network. This architecture 
is a hierarchical structure consisting of a backbone network, mobile switching center 
(MSC), base stations (BS) and mobile units. The backbone network is a wired network 
connecting the existing wired links to the MSC or MSC to MSC. The MSC connected 
to the BS is a special switch tailored for mobile applications. The BS manages the 
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communications activity of a covered geographic area called a cell where the mobile 
unit stays. A BS is usually in the center of a cell and neighboring cells overlap with 
each other to ensure the continuity of communication while the mobile users move 
from one cell to another. 

The air interfaces create a variety of problems for mobile units and BSs due to 
interferences. The interference invokes the data transmission errors instantaneously 
and makes the delay raised. It may no longer be possible for air links to meet the QoS 
commitments. We must decide how much air resources to assign to each active 
mobile unit. We also must decide how much extra air resources to assign to high- 
error mobile units at the expense of others. 



2.2 WFQ Scheduling Discipline 

The fair queuing algorithm, WFQ, is a scheduling and multiplexing discipline capable 
of providing end-to-end delay and bandwidth guarantees on a per connection basis. 
WFQ works by simulating an idealized fluid flow system and serving packets based 
upon their transmission times in the idealized fluid system. The WFQ scheduler 
distributes air resources according to weights provided by a call scheduling module in 
a cellular service environment [11]. 

Recently, a number of VTFQ-based algorithms based on error-free service model, 
lead/lag model, compensation model, and slot/packet queue decoupling have been 
proposed for adapting fair queuing to the wireless domain [6]. The error-free service 
model provides a reference for how much service a flow should receive in an ideal 
error-free channel environment. The goal of wireless fair queuing is to approximate 
the error-free service model by making short error bursts transparent to a flow, and 
only expose prolonged channel error to the flow. The lagging/leading flow denotes 
the amount of additional service that the flow must increment/relinquish in the future 
to compensate for additional service received in the past. The compensation model is 
to determine how lagging flow makes up their lag and how leading flows give up 
their lead. In slot/packet queue decoupling, when a packet arrives in the queue of a 
flow, a corresponding slot is generated in the slot queue of the flow and tagged 
according to fair queuing algorithm. From the service providers’ point of view, we 
proposed a novel scheduling discipline in which the highest acceptance by all mobile 
users is achieved to assist the WFQ resource scheduling in our research. 



2.3 Average Acceptance Rate 

When a mobile user wants to communicate with another, a channel must be requested 
from the base station in the initial stage, which may succeed or fail. If there are 
available channels, the mobile user will be assigned a channel for communication. If 
there are no free channels, the request will be rejected. Upon receiving a channel, the 
mobile user begins to communicate with others and may complete his call at the 
original cell where it requested a new channel or through other cells. If the mobile 
user completes his call through other cells, more than two different channels must be 
used during the call duration (to avoid interference). The procedure for requesting a 
new channel while the mobile user moves across a cell’s boundary is generally called 
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a handoff. If the handoff succeeds, the mobile user continues his communication 
without interruption. If the handoff fails, the mobile user’s call is force-terminated. 

Based on the call life flow, a request and satisfactory service for a mobile user is 
accomplished with 

A. A request will not be blocked in the initial stage 

B. Service will not be force-terminated in the handoff stage 

C. The quality of service is high during the service hold time 

Committing to mobile users’ expectations, we defined an Acceptance Indication, AI, 
as the QoS measurement in cellular data services. Therefore, 



AI = K3(1-P,) + K3(1-P; + K3P 


(1) 


Ph = '^p{i,j,k) 


( 2 ) 


Pf = p{C j,k) \j^c f-C,.k = Ca.C=C f+Ca 


(3) 




(4) 


K3-hK3H-K3= 1 


(5) 



Where 

the estimated probability that a request will be blocked 
the estimated probability that a service will be force-terminated 
P’. the percentage that indicates the degree of satisfaction during the hold time 
P^\ the estimated probability that a call will be completed 
C^: the number of reversed time slots for a handoff call 
C j- : the fixed channels of a cell 
Cj : the dynamic channels of a cell 

^ I Allocate ' cxpected number of time slots allocated to a request for the ith service 
C. maximum requirement capacity at a request for the ith service 
C: the total available resources at the scheduling time 
r : the mean service time for a completed call 
|J,: the mean service rate 

K, , Kj , Kj! the weighting values that concern mobile users 

In the above parameters, indicates the mobile user’s satisfaction at the call 
connection, and l/|i is the mean service time for a completed call. To increase 
flexibility, a Hybrid Channel Allocation scheme was adopted in our research. In this 
scheme, the channels are divided into fixed and dynamic sets. The fixed channels, C,, 
are assigned to different cells and all of the users share the dynamic channel, C^. 



3 'k-WFQ Discipline 



A fairness scheme, X-WFQ mechanism, based on the LaGrange X-calculus is 
proposed for scheduling resources in 3G data services with QoS provisioning. 
Scheduling issue maybe identified by the following function and constraints. 

Objective function 

Max(A/.p ), A/p = A/j H- A/j H- A/ 3 H- ...+ AI^ for n service request 
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Subject to 

n n 

^ Cj, Allocate ^ C 0 = 0 ^ C — Cj, Allocate 
j=l i=l 

2. Ci,Min ^ Ci, Allocate ^ Ci,Max /=l,2,...,n 
Where 

AL : acceptance indication of the ith service (defined on Equation (1)) 
C, minimum requirement capacity at a request for the ith service 



Resource scheduling is a constrained optimization problem that may be attacked 
formally using advanced calculus methods that involve the LaGrange X-calculus. To 
establish the necessary conditions for an extreme AI^ value, add the constraint 
function to the AI^ after 0 has been multiplied by a multiplier, X . 
i;=Ai, +X0 

The necessary conditions for an extreme A/^ value results when we take the first 
derivative of the X-calculus with respect to each of the independent values and set the 
derivatives equal to zero. Based on the 3G application domain, we only take the 
derivative of the X-calculus with respect to the values at a scheduling time 

give the set of equations 






dAIi 



dCi, 



i, Allocate 



dCi, 



- A = 0 (=1,2,... ,r 



/, Allocate 



(6) 



A = 



3A/1 



ddl, Allocate 



3A/2 



3C2 , Allocate 



(7) 



The necessary condition for the existence of a maximum AI^ for cellular data 
services is that the incremental AI of all of the mobile users is equal to X. Of course, 
to this necessary condition we must add the constraint equation that the sum of the 
CiMhcate values must be less than or equal to C. In addition, there are two inequalities 

that must be satisfied for each request. That is, the ^ and ^ 



The operating flow chart is shown in Figure 2. The operating scenarios are described 
as follows. 



I. Estimation of AI for each type of service 

Many treatments was performed by an OpNet simulator according to the variation of 
CiAihcaiet thus a set of AI values for the specified service class maybe estimated. 
Through the curve fitting technique for operating data filtering, the AI function for 
each service class may be treated for the linear or quadratic rate case. Such as 
A/=a+b. C +c. C ^ a,b,c=constant 

1 1 1 i.Allocate i i.Allocate ’ ’ 

I I . A,-calculus 

Based on the Step I, assuming two types of services whose incremental AI values 
are represented by the following equations. 

= 2 C 1 Cl, Allocate + b 1 = A 



dC 1 , Allocate 
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dAl 



■= 2C2 C2,, 



+ hi= A, 



dC 2, Allocate 

Cv=min{C, 

Therefore, C, may be estimated while a X value is selected. 

III. Generation of schedule 

n 

The iterative process finding X value was stopped while the ^ ,C i. Allocate — Cr. 

Two cases are discussed in the generating schedule. 

Case 1: Cr= (Ci,Max+C 2 ,Max), thus 

C 1, Allocate ~ Gi^Max 
G2,Allocate ~ G2,]y]ax 

Case 2 : Cr= C, thus we recognize the inequality constraints, then the necessary 
conditions may be expressed as shown in the set of equations making up Equation 
(8)-(10). Three situations are identified. 

dAI i 1 for C < C < C 18) 

• = A '^i, Min ^ '^i, Allocate ^ '^i,Max 



dC i , Allocate 

dAIi 

dC i. Allocate 

dAIi 



< X (9) 

>A for q,,_= or 0 (Blocked) (10) 

dC i. Allocate 

Situation I: X^X^ or X^, both of and values will fall into the range 

Situation II: A=A,, the value either or 0 (blocked) because of 

dAI 



dC 



1, Allocate 

■> Ai 



Situation III: X^X^, the value is equal to because of 



dAI 



dC: 



■ < Xa 



IV. Error compensation 

X-WFQ scheduling discipline extends WFQ algorithm via dynamic weight 
adjustments to compensate for the penalty derived from location-dependent errors and 
improve the service quality. The weight values are tuned based on the equivalent 
efficiency concept. The scheduling model is shown in Figure 3. The compensation 
scenarios are described in the following. 

Error Free Environment 

The cellular data services are invoked in an error free environment. That is, these 
services will actually obtain the reserved resources scheduled by the X-WFQ scheme. 

Location-Dependent Error Environment 

To support QoS guarantees for multimedia services in cellular data networks with 
location-dependent errors, we must decide how extra resources are assigned to high- 
error mobile units such that their QoS is supported. 
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Fig. 2. The flow chart of A-calculus 



Transmitter Call Scheduling Module 




Fig. 3. A -WFQ scheduling via error compensation 



To achieve the high AI, an equivalent efficiency concept is proposed here. We 
defined a penalty factor 3 to compensate for the effect of location errors. The 
functional block is illustrated in Figure 4. 



3 = 



n 



1 , 1 , 1 

1-P. 1-A ‘l-P. 



( 11 ) 



where P^, P^, . . P„ are the location error probabilities of \th, 2th, and nth service. 

Thus the new weight value of ith service is, 

Weight r = Weight i *3 *1/(1-P,) (12) 



However, the B* Weight i’ for several guaranteed QoS services may be not 
enough. Hence, the reserved resources are transferred from best effort services to 
guaranteed QoS services. 

• Guaranteed QoS services 

_{B* Weight /} ,,-1 (13) 



Add R , 



{S * Weight /'} 
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Best-effort services 



. Substract 



R„.-J = I iAdd R 

^ res — j i — > QoSguaranI ee 

j — > best — effort service 



( 14 ) 



) 




Fig. 4. Compensation scenarios 



4 Performance Analysis 

An investigation will be done using Notebooks with Ericsson S888 handsets and 
Nokia Card Phone in the Telecommunications cellular system, Taiwan. The datagram 
is transmitted from the source end of experimental channels (1 control time slot + 7 
traffic time slots) through PSTN network to the destination end of experimental 
channels. An example is shown as follows. Three types of services including voice 
phone, file transfer, and interactive video, are requested at the scheduled time. The 
parameters listed in Equations (2), (3), and (4) were measured and evaluated using an 
OpNet simulator. The relationship between AI^ and Ci Aiiocate are identified. Through 
the curve fitting techniques, the Ali functions are fixed and shown in Figure 5. 

Based on the individual AI functions of each request, we observed the following 
simulation results. 




- V oice 
phone 

- File 

T ransEer 

IhtEract±/e 

vddeo 



0 1 2 3 4 5 6 



C ijklijc 



AIvoice_phone~ 0.21+0.10*Ci,All„ate+0.05*Ci .Allocate 

AI„. ™,„,,=0.42+0. +0.08*C.^„ J 
1 3+0.07*C„,.„.+0.02*C,... J 



Fig. 5. The AI function for each request (Kj=0.2,Kj=0.4,K3=0.4) 
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1. Fair Resource Sharing: If the minimum and maximum resources requested by 
the three requests were (0,3), (0,2), and (0,5), respectively, and the available 
resource at the scheduled time was from 1 to 8 time units, the simulation result is 
as shown in Figure 6. In Figure 6, we observe that many time units are allocated 
to interactive video and voice phone. A mobile user expects a higher quality for 
real-time services. The more the available resources, the higher the rate. 

2 Over-Rating Resource Sharing: The incremental rate is over X when > 

CjMax- In this situation. Equation (9) is suitable to resolve the problem. In the 
simulation, this situation occurs if the resource is greater than 8 time units. 

3. Under-Rating Resource Sharing: In the simulation, all values were set to 
zero, thus the situation that the incremental rate would be under X did not occur. 
However, the value in real-time services is always not equal to zero, thus 
Equation (10) is suitable to resolve the problem. In the simulation, this situation 
occurs if the required resource of voice phone is modified to (1,3) and the 
available resource is less than 2 time units. 

4 Network Efficiency: The network efficiency is dependent on the service 
throughput. Two video-based services were invoked in different error 

environments. The efficiency analysis is shown in Figure 7. It was found that the 
efficiency is less than 50% if error rate of one of the services is greater than 0.5. 

5. Performance Analysis: Four services were invoked in different error 

environments. Four cases were tested and the results are listed in Table 1. Different 
capability B may affect the AI values. However, different K values may generate 
different results. From the simulation results, we found that the average AI values 
were dominated by the QoS guarantee services. In addition, the greater the residual 
capability is, the higher the average AI is. The results show that the average AI of 
the schedule derived from our approach provides 5% higher values than the 
traditional WFQ scheme. 



5 Conclusion 



A fairness scheme based on the WFQ mechanism, optimization theory, LaGrange A- 
calculus and equivalent efficiency concept was proposed for shared resources in 
cellular data services with QoS provisioning and location-dependent errors. The 
simulation results for three traffic classes clearly demonstrate higher acceptance with 
mobile users at the expense of a small blocking performance. Similar methodologies 
can be applied for large-scale cellular data service systems with differentiated QoS. 




Fig. 6. The simulation result 
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Interact lve-vldeo-2 



Fig. 7. Network efficiency for two video-based services 



Table 1. Case study (B=30A, K,=0.1,K,=0.2, K3=0.7) 





Case 1 


Case 2 


Case 3 


Case 4 


pi 


Rres 


output 


pi 


Rres 


output 


pi 


Rres 


output 


pi 


Rres 


output 


Voice 

Phone 


0 


l.OA 


l.OA 


0 


l.OA 


l.OA 


10 


I.IA 


l.OA 


20 


1.2A 


l.OOA 


Interactive 

video 


0 


8.0A 


8.0A 


30 


11.4A 


8.0A 


40 


13.3A 


8.0A 


0 


8.0A 


8.00A 


File transfer 1 


0 


10.5A 


10.5A 


0 


7.2A 


1.1k 


15 


8.0A 


6.8A 


30 


8.6A 


6.05A 


File transfer 2 


0 


10.5A 


10.5A 


30 


10.3A 


1.2k 


10 


7.5A 


6.8A 


50 


12.1A 


6.05A 


Total 




30.0A 


30.0A 




30.0A 


23.4A 




30.0A 


22.6A 




30.0A 


21.10A 


Average AI 




0.73 




0.68*70.71 




0.59*70.68 




0.65*70.73 



*results from traditional WFQ scheme 
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Restoration from Multiple Faults in WDM Networks 
without Wavelength Conversion 
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Abstract. This paper addresses the problem of achieving full and fast 
restoration to tolerate as many faults as possible in wavelength-routed 
wavelength division multiplexing (WDM) networks with no wavelength 
conversion. We model the problem of finding the maximum number of faults 
tolerated as a constrained ring cover problem, which is a decomposition 
problem of exponential complexity. Three heuristic methods which guarantee 
that at least one fault can be tolerated are proposed. The Ear Decomposition 
(ED) method can always generate a decomposition to guarantee that only one 
fault can be tolerated. The Planar Decomposition (PD) method, which takes 
advantage of the bipartite graph model to generate a decomposition, can tolerate 
up to / faults, where / is the maximum cardinality between the two bipartite 
vertex sets. The Maximally Separated Rings (MSR) method uses the greedy 
method to find a decomposition to tolerate as many faults as possible. The 
marked-link (ML) method is also proposed to enhance the performance by 
marking some links, which are originally used for protection, available for 
normal transmissions. Finally, we also evaluate the number of faults tolerated 
and the blocking probabilities of these methods in three example networks. 



1 Introduction 

As the WDM technology is introduced into networking, survivahle and flexible 
optical networks will he deployed widely in the near future. In particular, restoration 
of services within a short period of time frame after a link or node failure is becoming 
a requirement. 

Several protection and restoration schemes for enhancing network reliability have 
been presented in [1-5, 7, 10]. If we have a fault detection capability in the OXC as in 
[9], fault detection can be done only at the endpoints of where the failure occurs. The 
fault recovery in the path/end-to-end mode cannot be completed within the optical 
layer and must be left to the upper layer since it needs acknowledgement. While the 
link restoration mode is based on the localized control and the coordination can be 
completed within the optical layer if the rerouting paths are static and the resources 
are sufficient. The proposed restoration scheme is thus based on localized control and 
can be executed within a sub-second time frame instead of relying on an online 
computation of backup routes which is slower but more bandwidth efficient. 
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The number of wavelength convertible nodes in the network is another factor 
which has a significant impact on the degree of localization and coordination needed 
for restoration. This has been demonstrated in a pure ring, in which the lack of 
wavelength conversion greatly simplifies and expedites the activation of restoration 
[4]. Unfortunately, in a mesh network with only one bi-directional fiber pair in each 
link, channel protection is inherently dependent on wavelength conversion since the 
assignment of working and protection channels can conflict with each other, and thus 
it is not directly applicable to a network without wavelength conversion. Further, the 
number of faults tolerated is only one in the previous works for either link fault or 
node fault model [1, 10]. Little work has been done to increase the number of faults 
tolerated except the study in [7], while it required the network topology to be planar. 

Therefore, in this paper we consider WDM networks without wavelength 
conversion. The main objectives of the proposed scheme are fast and full restoration 
while tolerating as many multiple faults as possible. Our method creates two fully 
connected directed sub-graphs (Working (WSN) and Protection (PSN)) of a network 
by three heuristic methods such that after some edge/node failures all operational 
nodes are still connected by the WSN or PSN sub-graph. We model the problem of 
achieving the maximum number of faults tolerated in the construction as a 
constrained ring cover problem and show it to be NP-complete. Three heuristic 
methods are Ear Decomposition (ED), Planar Decomposition (PD), and Maximally 
Separated Rings (MSR). The ED method can always generate a decomposition to 
guarantee that only one fault can be tolerated. The Planar Decomposition (PD) 
method, which model the problem as a bipartite graph, can generate a decomposition 
to tolerate up to/faults, where/is the maximum cardinality between the two bipartite 
vertex sets. The Maximally Separated Rings (MSR) method uses the greedy method 
to find a decomposition to tolerate as many faults as possible. 

The rest of this paper is organized as follows. In Section 2, we give the 
preliminaries for our restoration scheme. In Section 3, the problem of finding a 
decomposition to tolerate maximum number of faults is modeled as a ring cover 
problem and three heuristic methods for this problem are detailed. Section 4 depicts 
the performance evaluation. Finally, we conclude our study in Section 5. 



2 Preliminaries 

In this section, we introduce the basic idea that how a fast and full restoration can be 
achieved by simple loop-back mechanism under various fault scenarios. 



2.1 Single Fault 

The algorithm to selecting proper directions that satisfy the full restoration conditions 
for link/node-redundant networks was introduced in [1]. The algorithm is based on 
the ear decomposition (ED) method, which is a technique of decomposing a given 
network into simpler parts such that the computation on the simpler parts corresponds 
to the computation on the entire network. The time complexity of the algorithm is 
0(V+E) since it is based on the depth first search. Based on st-numbering [14], we 
further exploit a scheme of st-looping to find the associated ring cover set for 
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explaining that the ear decomposition method can only tolerate one fault. An 
example is shown in Fig. 1 to see the results of the algorithm. Fig. 1(a) is an example 
16-node graph whose nodes meet the st-numbering rules if (s, i) = (1, 16). Fig. 1(b) 
and Fig. 1(c) show the results of selecting the directions in all the links. 

For the graph shown in Fig. 1, the associated ring cover sets are Rj=[l, 8, 9, 10, 11, 
13, 14, 16],R={1,2, 3, 7, 10, 11, 13, 14, 16), /?,={3, 4, 6, 15, 16, 1, 2}, /?,={ 1, 5, 6, 
15, 16}, and R={6, 12, 13, 14, 16, 1, 5, 6}. 

An interesting observation is that all rings in the associated ring cover set contains 
the edge (1, 16), that is, (s, f). This makes the method be able to tolerate one fault 
only under the full restoration requirement since the number of backup wavelengths 
can be less than the number of affected working wavelengths in the link (s, t). Thus, 
using the method based on the ear decomposition can only tolerate one fault. 




Fig. 1. (a) Example st-numbered graph, (b) sub-graph numbered increasingly except 
(s, t), (c) sub-graph numbered decreasingly except (j, f). 



2.2 Multiple Faults 

We consider an example network and its possible double cycle covers (DCC). On the 
basis of these double cycle covers, we discuss whether double cycle covers can be 
used in a WDM network without wavelength conversion to survive multiple faults. 
Fig. 2 shows a possible double cycle cover in bold lines for the example topology in 
Fig. 1. 

For simplicity, let us consider just two wavelengths A and B. Fig. 3 shows that we 
cannot use one ring to protect another unless we have wavelength convertible nodes at 
nodes 1, 3, 10, and 16, and thus incur significant network management overhead. 
Therefore, using the double cycle cover for WDM recovery would require wavelength 
conversion. 

The maximum number of link failure that can be recovered is [f/2j, where /is the 
number of faces. In the worst case, for a bi-directional ring with inner and outer face, 
only one fault can be restored. Although the number of faults tolerated is greater than 
one, the DCC can not be applied to a WDM network without wavelength conversion. 
In the following section, we will find a decomposition method to allow multiple faults 
without the need of wavelength conversion. Note that “multiple faults” in this paper 
means multiple link faults that are not introduced by a single faulty node. 
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Fig. 2. A double cycle cover for a 16-node network. 




Fig. 3. Double ring cover does not work without wavelength converters. 



3 Problem & Heuristic Methods 

After the above discussions, we know that the ED method can work in a WDM 
network without wavelength conversion, but not for multiple faults. In contrast, the 
DCC works for multiple faults but fails on WDM networks without wavelength 
conversion. A comprise can be made to relax the ring cover set from a double cycle 
cover to a cycle cover without conflict directions in each link. Once such ring cover 
set is found, the direction in each link is accordingly defined and the graph can be 
decomposed into two connected directed sub-graphs. This is similar to ED except 
that each member in the ring cover set does not always meet the special edge (s, f). 
Thus, tolerating multiple faults is possible for a network with such ring cover set. 

Given a graph G = (V, E), our goal is to find a best ring cover set C for G to 
tolerate the maximum number of link faults. This problem is difficult 
computationally. Since no polynomial time algorithm exists to solve the above 
problem, heuristic methods are necessary. The first heuristic method follows the ED 
method to guarantee that the maximum number of faults tolerated is one. The second 
one is called the planar decomposition (PD) heuristic because it is based on the planar 
property. It denotes each possible cycle by a new vertex, and determine whether the 
edge between each vertex pair exists or not through the intersection of the cycles. 




Restoration from Multiple Faults in WDM Networks without Wavelength Conversion 321 



That is, when an edge exists between two vertexes, it means the two cycles have some 
intersections. 

If the graph formed by the new model can be two-colorable, it means that the 
direction assignment in all links is done. Two-colorable graphs are also called 
bipartite graphs. The maximum cardinality between the two parts of the bipartite 
graph is the maximum number of faults tolerated because they can be restored 
simultaneously and routed through disjoint paths without blocking one another. 
However, the graph formed by the new model is not always directly two-colorable. 
Some cycles (vertices) have to be deleted to make two-colorability possible. In order 
to keep as many vertices as possible to increase the number of tolerated faults, the 
deletion must be minimized. This problem of minimum vertex deletion to obtain 
connected sub-graph with bipartite property has been shown to be NP-complete in 
[11]. Therefore, the PD method is to find an approximation for the original minimum 
deletion problem. The third heuristic method MSR (maximally separated rings) is to 
directly apply the depth first search on the original graph to greedily locate rings in 
the network. In fact, it looks for rings iteratively. That is, this heuristic method tries 
to avoid covering a link with many different rings. The procedure continues until 
every link is covered by some ring in the cover. 

We give an example illustration in Fig. 4 for the PD heuristic method. 




Fig. 4. The process of the PD heuristic on the example network in Fig. 1. 



Based on the same example network in Fig. 1, the first step is to find the basis of 
the cycle cover set. The possible planar faces are shown in Fig. 4(a). There are six 
possible faces in the original network. Since the links of the outer face completely 
contains the inner faces, the induced graph is shown in Fig. 4(b), which is still 
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connected without vertex 6. The next step is to delete the vertices to make the graph 
two-colorable. In Fig. 4(c), vertex 2 is removed from the resulting graph. The final 
two-colorable graph must be a bipartite graph as shown in Fig. 4(d). The maximum 
cardinality of the two set and is two. Therefore, the decomposition derived by 
the heuristic can survive two faults and the final decomposition is shown in Fig. 4(e). 

All these heuristic methods are applied directly to the original network topology. 
We observed that some links can be removed while maintaining the network 
connectivity. This corresponds to the minimal sub-graph with edge deletion, which is 
also an NP-complete problem. The removed links are marked. The marked links can 
only be used to respond to the needs of the network. Although the average length of 
restoration paths can be longer when the marked links are used, the restorability is not 
reduced and the blocking probability is smaller in working channels. We call our 
enhancement method as the marked-links (ML) method, which greedily find the 
available removable links. The following section is to evaluate the blocking 
probability loss for the full restoration and the blocking probability gain with the ML 
method. 



4 Performance Evaluation 

In order to formally compare the performance of the proposed heuristic methods with 
that of the unprotected method, we start by making the standard series independent 
link assumption introduced by Lee and commonly used in the analysis of circuit- 
switched network [6] . Let p be the probability that a wavelength is used on a hop and 
F be the available wavelength on a fiber. Note that since pF is the expected number 
of busy wavelengths, p is a measure of the fiber utilization along this path. Now 
consider a network without wavelength conversion. The probability of blocking is 
the probability that each wavelength is used at least in one of the FI hops, i.e., 

PAHl-p)Y (1) 

Three example networks, National Network, Icosahedron, ArpaNet, are shown in 
Fig. 5 and used to evaluate the number of faults tolerated and the blocking 
probability. The main topology features of these networks and the experimental 
results are presented in Table 1. 






Fig. 5. (a) National network, (b) Icosahedron network, (c) ArpaNet network. 



Note that V is the number of nodes, E the number of links, d the average node 
degree, H the average hop distance for shortest path routing on all node pairs, and the 





Restoration from Multiple Faults in WDM Networks without Wavelength Conversion 323 



three heuristic methods are: ED, PD and MSR. Further, the National network and 
Icosahedron network are planar, while the ArpaNet is non-planar. Note that when 
performing PD on ArpaNet, two links are ignored to keep planarity. 

In general, the PD and the MSR can always find a better ring cover set to select the 
directions for the links of a network than ED. In the following, we focus on the 
blocking probability for three heuristic methods and compare them with the 
unprotected method using the formula derived in Equation (1). 

Table 1. Main topological parameters of the example networks and the number of faults 
tolerated. 





Parameters 


# of faults 
tolerated 


Networks 


V 


E 


6 


H 


ED 


PD 


MSR 


National 


24 


41 


3.38 


2.93 


1 


9 


9 


Icosahedron 


12 


29 


5 


1.64 


1 


7 


7 


ArpaNet 


20 


31 


3.01 


2.81 


1 


3 


3 



From Fig. 6, it is observed that the three heuristic methods have no significant 
differences in the blocking probability. When the ML method is considered, the 
change is more than the difference between three heuristic methods. In addition, after 
more marked links are introduced in the working sub-graph, the blocking probability 
can be improved as much as the increase in the number of marked links. However, 
the unprotected method always outperforms the three heuristic methods and the 
marked-link method since it dose not waste the resources for the purpose of 
protection. 





(a) F=10 (b) F=50 

Fig. 6. The blocking probabilities for the three heuristic methods, the unprotected method, and 
the ML method for the National network. 

Another problem is to find the worst case such that the load can make the 
difference larger. In Fig. 7, it clearly shows the differences in blocking probability 
between the three heuristic methods, the marked-link method, and the unprotected 
method for various loads. It has a saturation load that makes the difference the worst. 
In Fig. 7(a) with F=10, the saturation occurs at p=0.55, while in Fig. 7(b) with F=5Q, 
the saturation occurs at p=0.1. It can be concluded that when the number of 
wavelengths increases, the load that will cause saturation is higher. Combining the 
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results of Fig. 6 and 7, we can operate a network at a load lower than the saturation 
point to keep the differences in blocking probability low and to keep the absolute 
value of blocking probability low, too. 





Fig. 7. Differences in blocking probability between the three heuristic methods, the marked- 
link method, and the unprotected method. 



5 Conclusions 

This paper proposed a fast and full restoration mechanism under multiple-faults 
scenario for WDM networks without wavelength conversion to alleviate the 
management overhead. Three heuristic methods: Ear Decomposition (ED), Planar 
Decomposition (PD) and Maximally Separated Rings (MSR), are proposed to find 
proper directions in the ring cover set. The Marked-Link (ML) method is proposed to 
further improve the blocking probabilities of working channels. The analytical results 
also show that we can operate a network below a certain saturation load to make the 
difference of blocking probability smaller between the proposed heuristic methods 
and the unprotected one while keeping a lower absolute blocking probability. 
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Abstract. The so-called media convergence to the Internet is foreseen. As a 
consequence of this convergence, MANs will face new demands not only in 
terms of bandwidth, but also in terms of services. To meet these new demands, 
new MAN architectures are required. WDM-based MAN architectures that 
tackle the first problem are available not only in the literature, but also on the 
market. In this paper we deal with the second problem. Specifically, we 
describe a novel WDM-based MAN architecture that supports group 
communication, a service that is expected to increase considerably as new 
applications converge to the Internet. Based on packet switching, the 
architecture supports both point-to-point and point-to-multipoint 
communication in the optical domain. 



1 Introduction 

The convergence of various media, applications and networks to the Internet in the 
near future is foreseen. Such a convergence will affect metropolitan area networks 
(MANs) directly. Firstly, the demand for bandwidth is expected to increase 
enormously. It is arguable whether MAN architectures based on electronic 
technologies will cope with such a foreseen demand. Secondly, Internet traffic will 
have characteristics that will differ even more from those that drove the design of 
MAN architectures currently in operation. MAN architectures were mostly designed 
to cope with long-lived voice flows and, therefore, are circuit-switched. Internet 
Protocol (IP) traffic, however, is inherently bursty and consists mostly of short-lived 
data flows and small packet sizes. 

The bandwidth demand problem has been tackled with the deployment of 
wavelength division multiplexing (WDM). WDM is an optical technology that 
exploits the frequency spectrum of the light, thus enabling several distinct optical 
channels within a single optical fiber, each carrying up to tens of Gbps. 

MAN architectures based on WDM are a fact today. However, they are still based 
on circuit switching and the store-and-forward (i.e., multihop) communication 
paradigm. To cope with the characteristics of the next generation Internet, MAN 
architectures should pursue all-optical (i.e., single-hop) packet switching. Besides the 
robustness and better resource utilization that are typical of packet switching, all- 
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optical packet switching eliminates queuing delays at intermediate nodes and provides 
bit rate and protocol transparencies. 

All-optical WDM packet-switched network architectures are described in [1], [2], 
Common to these architectures is the fact that they rely on the slotted-ring concept 
and the use of tunable transmitter (TTx) and fixed receiver (FRx) node architectures. 
In such networks, the total capacity of each channel is divided into (time) slots of 
fixed length. Upon arrival of an empty slot, a node tunes its laser onto that slot's 
wavelength and transmits. At the destination, the payload is obtained from the slot 
and the slot is released, being reused by either the node itself or downstream nodes. 

Whilst TTx/FRx node architectures provide very high performances, they have 
some drawbacks. Firstly, tunable lasers are too expensive and very difficult to control. 
Secondly, the use of a single FRx per node makes the support of group 
communication somewhat inefficient and the demands for group communication are 
expected to increase considerably as a result of the foreseen convergence. For 
instance, a source node may have to transmit a given information W times, where W 
denotes the number of distinct wavelengths that the destination nodes can receive 
from. This is a considerable drawback. 

Dey et al., in [3], propose a node architecture that deals with these problems. 
Unlike other node architectures, the node architecture relies on a fixed transmitter 
(FTx), array of receivers (ARxs) configuration. The main benefits of this node 
architecture are: i) management complexity due to TTxs is eliminated, ii) cheap cost 
of FTxs and iii) support of group communication is more efficient. 

The node architecture has different characteristics and requirements and, hence, 
requires medium access control (MAC) mechanisms specifically designed for it. We 
discuss such mechanisms in this paper. We also discuss some performance results that 
were obtained via simulation activities. 

The rest of this paper is organized as follows. In Sect. 2 the network and the node 
architectures are described. In Sect. 3 the MAC protocol that has been designed for 
the network and the node architectures described in Sect. 3 is described. In Sect. 4 
some performance results are shown and analyzed. In Sect. 5 we conclude the paper. 



2 The Network and the Node Architectures 

The network architecture is based on an adaptation of the slotted-ring concept to the 
multi-channel nature of WDM. In this architecture, W wavelength channels are used 
to carry payload information. A single extra wavelength channel is used to carry 
control information. The total bandwidth of each channel, including the control one, 
is divided into (time) slots of fixed length. 

Slots across the W payload wavelength channels, herein called payload slots, are 
synchronized in parallel so as to reach each node all at the same time. Slots on the 
control wavelength channel, herein called control slots, are sent slightly ahead of their 
corresponding payload slots. This is to account for the configuration time of the 
fabrics at the nodes. Fig. 1 illustrates how payload slots and control slots are 
synchronized. 
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\ ^ Control 




Fig. 1. Slot alignment 

The network is partitioned into S = N / W segments[] where N is the number of 
nodes in the network. If N < W then each node is assigned (via management 
operation) an exclusive transmission wavelength channel. Otherwise, S nodes, each 
on a different segment, share the same transmission wavelength. A node can transmit 
on only one wavelength. On the other hand, a node can receive on all wavelengths 
simultaneously. Fig. 2 illustrates the basic network architecture. Although out of the 
scope of this paper, scalability can also be achieved via an interconnected ring 
structure. 




Fig. 2. Example of Network Architecture; F = 1 optical fiber; N = 8; W = 4. 

The node architecture is shown in Fig. 3. Each node is equipped with one FTx and 
an array of W FRxs, each tuned on a distinct wavelength, for payload transmission 
purposes. Each node is also equipped with one FTx and one FRx, both operating on 
the same wavelength, for control information transmission purposes. Essential to the 
support of group communication is the adoption of three-state switches and tap- 
couplers in the node architecture. 

A slowly tunable X-drop is used to separate the wavelength channel carrying the 
control slot from the wavelength channels carrying the payload slots. The control slot 
is converted to electronic domain and processed by the header processor. To account 
for the time to process a control slot, payload slots are delayed in fiber loops. 



* For the sake of simplicity we assume N to be an integer multiple of W. 
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Based on the information contained in the control header, the header processor sets 
the three-state switch to either bar, split or cross state. If the slot is not destined to that 
node then the switch is set to the bar state. If the slot is destined to that node only then 
the switch is set to the cross state. If the slot is destined to that node and others then 
the switch is set to the split state. 

A second slowly tunable X-drop is used to separate the transmission wavelength 
channel of a node from the other wavelength channels. When a slot on the 
transmission wavelength reaches the switch, the latter has already been set to the 
appropriate state. 

The slots on the remaining wavelengths are de-multiplexed and sent each through a 
tap-coupler, which drops a very low percentage of the signal to the connected 
receiver. The signal is converted to electronic domain and either selected or 
discarded, as determined by the header processor after processing the control slot. The 
signal that passes through each tap-coupler is again multiplexed. 

The process ends with the slot on the transmission wavelength and the slot on the 
control wavelength being added via two slowly tunable X-add in tandem. 



3 The MAC Protocol 

A MAC protocol is required to coordinate access in the network. The protocol differs 
from MAC protocols of conventional networks in that it aims at minimizing 
processing delays rather than at optimizing network utilization. In all-optical networks 
bandwidth is plentiful. Protocol processing is the bottleneck. 

The MAC protocol relies on the label switching forwarding paradigm. Label 
switching provides for traffic engineering (TE) and virtual private networking (VPN), 
both essentials to the provision of next generation services. Although TE makes no 
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sense in the basic network architecture described in Sect. 2, MANs are usually laid 
out as counter-rotating ring topologies or interconnected ring topologies. In these 
topologies[] TE is extremely important. 

Specifically, the protocol follows the Multi-Protocol Label Switching (MPLS) 
architecture [4], which is under development within the Internet Engineering Task 
Force (IETF). Slots follow a previously established label-switched path (LSP). Each 
node along a LSP maintains a cross-connect table known as label information table 
(LIT). Each LIT entry identifies an input triple of the form <fiber link, wavelength, 
label> to an output triple of the same form. Each control slot is assigned a label at an 
ingress node. At each subsequent node, the label is used as an index to a LIT to 
determine whether the corresponding payload slot should be either received, 
forwarded or discarded. 

Since control slots are processed at every node along a LSP, the layout of the 
control slot is very important to minimizing protocol latency. Fig. 4 depicts the 
control slot layout. 



0 12 3 

01234567890123456789012345678901 



|0| EXP I 



|S| CRC-8 




0 = Occupancy (free/busy) 

TTL = Time-to-live 
P = Parity 

CRC = Cyclic Redundancy Check 



Label 



RESERVED 



EXP = Experimental 
F = Fairness 
S = Stack 



Fig. 4. Control slot layout 

The MAC protocol provides a transport service. The payload slot layout is an 
opaque structure in which the MAC protocol has no interest whatsoever. The protocol 
simply transports a frame received from a source node’s high-level data link control 
(HDLC) sub-layer to a destination node’s HDLC sub-layer. This transport service is 
unreliable. Error probabilities in all-optical networks are so low that reliability can be 
better achieved at end nodes’ HDLC sub-layers. Trying to achieve it in a hop-by-hop 
basis just adds death weight (i.e., useless overhead) to the protocol. 

To achieve high throughputs and low delays, unicast slots are released at 
destination nodes and can be reused by the destination node itself or by any 
downstream node. This implies that transmissions on a given channel can occur 
simultaneously (as long as they take place on distinct links). The term spatial reuse is 
also used to characterize networks that employ destination release. 

Multicast slots can be released either at the last destination or at the source. In the 
former, a multicast slot is forwarded over a point-to-multipoint LSPs. This is more 
suitable for networks running dense mode multicast routing. In such networks, 
receivers are densely distributed geographically and, therefore, releasing multicast 
slots at last destination nodes may result in considerable performance improvements. 
Salvador et ah, in [5], propose a protocol to construct point-to-multipoint LSPs in all- 
optical WDM networks running dense mode multicast routing. In the latter, a 



^ We do not consider these topologies in this paper. 
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multicast slot is simply broadcast. This approach is simpler and more suitable for 
networks running sparse mode multicast routing. In such networks receivers are 
sparsely distributed geographically and, therefore, improvements that can be achieved 
in terms of performance are not worth the cost of complexity that is required to enable 
last destination release. 

Destination release may introduce access unfairness under certain traffic patterns. 
A node that is immediately downstream to a node that is the destination of most of the 
traffic in the network may monopolize the channel and prevent other nodes from 
transmitting. To achieve access fairness, the transmission quota-hased mechanism 
proposed in the MetaRing architecture [6] is adopted. Under such a mechanism, 
transmission only takes place if the three following conditions are met: i) there is at 
least one packet waiting for transmission; ii) an empty slot arrived and iii) the node 
still has some transmission quota left. Otherwise, the node refrains from transmitting 
and forwards the slot to the next node. 

The fairness mechanism works as follows. Upon two visits of the so-called SAT 
signal, a node is allowed to transmit up to 1 H- k > 0 data units, where 1 and k are 
multiples of the slot size. If upon visit of the SAT signal (i.e., the F hit of the 
incoming control slot is set to 1) a node has transmitted at least 1 data units, the node’s 
quota is renewed and the SAT signal is immediately forwarded to the next node. 
Otherwise, the node holds the SAT signal (i.e., sets the F bit of the outgoing control 
slot to 0) until it has transmitted at least 1 data units. The node’s quota is then renewed 
and the SAT signal is forwarded to the next node (i.e., the F bit of the outgoing 
control slot is set to 1). 

Fairness is enforced on a node basis. However, there is one SAT signal regulating 
transmission on each channel. This is to prevent nodes transmitting on highly loaded 
channels from affecting nodes transmitting on lightly loaded channels. If only a single 
signal is used in the network, nodes transmitting on highly loaded channels will hold 
the SAT signal and nodes transmitting on lightly loaded channels will be prevented 
from transmitting while the SAT signal does not arrive. 

Each node maintains one queue per LSP that the node can transmit over (i.e., each 
node maintains at least N-1 queues, where N is the number of nodes in the network). 
In the Internet, packet sizes are variable and mostly small [7] while the slots are of 
equally fixed size. Thus, with a single queue it may not be possible to achieve 
reasonably good slot utilization due to head-of-the-line (HOL) blocking. 

Queues store frames rather than packets. Frames are formed in advance because 
performing framing on the fly may constitute a bottleneck at certain bit rates. A frame 
may contain one or more packets depending on the slot size and the packet sizes. 
Although concatenating packets at the source and separating them at the destination 
are expensive operations in terms of processing, we believe that this is acceptable 
considering that slots are of fixed size while packets are of variable size. 

One could argue that because packets in the Internet are mostly small [7], small 
slots could be used so that concatenation and separation (CAS) operations would not 
be required. The price to pay for this simplicity, however, is higher control slot 
forwarding rates. This may constrain the network in terms of scalability [8]. 

A frame also contains a node’s 48-bit MAC address. The source node’s address 
can be used by the destination node’s HDLC to inform that certain frames or even 
packets are missing. 

The protocol works as follows. Upon arrival of a slot, a node first verifies if P is 
correct. If not, the corresponding payload slot is discarded and marked as free. If the 
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node can transmit on the payload slot’s wavelength then it attempts transmission. If 
no error is detected, the node proceeds by checking O to find out whether the slot is 
free or busy. 

If the slot is free and the node has sufficient transmission quota left, the MAC layer 
informs HDLC about the arrival of a free slot along with the slot’s fiber link and 
wavelength. Based on the slot’s fiber link and wavelength, HDLC moves the packet 
scheduler to the appropriate queue and selects the frame at the head of the queue. 
HDLC then returns the frame together with label that describes the LSP over which 
the frame must be sent. 

The MAC protocol updates the control slot’s label field with the received label. 
The other fields are updated accordingly as well. The control slot is forwarded to the 
next node. Transmission of the selected frame is delayed sufficiently to keep payload 
slots synchronized in parallel^] This assures that misalignment of slots due to 
dispersion is corrected at least once every ring revolution. 

At each subsequent node, the slot’s label is matched to a LIT. If no match is found, 
the corresponding payload slot is discarded and marked as free. If the node can 
transmit on the payload slot’s wavelength then it attempts transmission. If a match is 
found that determines that the slot must be forwarded, the slot’s label is swapped with 
the matched entry’s outgoing label and the slot is forwarded over the outgoing 
interface and the outgoing wavelength. If a match is found that determines that the 
node is a destination of a multicast session, the payload slot’s content is sent up to 
HDLC. The slot is forwarded to the next node according to the matched entry. If a 
match is found that determines that the node is the destination (in case of unicast) or 
the last destination of that slot (in case of multicast), the payload slot’s content is sent 
up to HDLC and the slot is marked as free. If the node can transmit on the slot’s 
wavelength, it attempts transmission. Otherwise, it forwards the empty slot to the next 
node. 



4 Performance Results 

We now present some performance results that were obtained via simulation 
activities. The simulations considered a 50km long network with N = 16 nodes, each 
equally spaced from one another, and W = {2, 4, 8, 16) wavelengths. Slots are 552- 
byte long and packets are assumed to fit exactly in the slots. Transmission quota Q(l, 
k), where 1 = 100 and k = 200, is chosen. Each node generates the exact same amount 
of unicast traffic to each other node. Packet arrival is Poisson and is such that every 
queue in each node has always at least one packet ready for transmission. Node 0 also 
generates multicast traffic at the same rate to nodes 1, 3, 5, 7, 9, 11. 



^ Advanced signal dispersion is assumed. 
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Fig. 5. Per node average Throughput 

Fig. 5 plots per node throughput. Per node throughput is considered as the number 
of successfully transmitted slots divided by the total number of processed slots by a 
node. Fig. 5 shows that nodes achieve maximum throughput when N < W. Two are 
the reasons for this: i) there is no access contention since each node is assigned an 
exclusive transmission channel; and ii) 1 > the bandwidth * latency product, which 
guarantees that a node will never be prevented from transmitting upon arrival of an 
empty slot. As the number of nodes sharing a given channel increases, the throughput 
of each of these nodes gets worse. 




Fig. 6. Average Channel Utilization 

Fig. 6 plots channel utilization. Channel utilization is considered as the number of 
links traversed by non-empty slots divided by the total number of links traversed by 
both non-empty slots and empty slots (plus inter-slot gap). As expected, the utilization 
of a channel improves as the number of nodes transmitting on that channel increases. 
The explanation for this statement is that under uniform and symmetric traffic 
conditions the average number of hops traversed by a (non-busy) slot from a source to 
a destination is N / 2. Thus, once a slot is released is has to traverse H hops in the 
average before being reused, where H is given by N divided by the number of nodes 
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transmitting on that channel times 2. This explains why, for instance, average 
wavelength utilization approximates 50% when N = W. 

Note that this figures change in the presence of multicast traffic as there might be 
more than one destination. In this case, average channel utilization will depend on the 
distance in number of hops from a source to the last destination. The bigger the 
distance the higher the channel utilization is. 
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Fig. 7. Per Queue Average Access Delay 

Fig. 7 plots per queue average access delay. Again as expected, per queue average 
access delay gets higher as: i) the number of possible destinations increases; and ii) 
the number of nodes transmitting on a given channel increases. When N = W, per 
queue average access delay is given by the number of possible destinations. This is 
why, for instance, when N = W = 16 and there are no multicast receivers, per queue 
average access delay equals 15 (a node does not transmit to itself and, therefore, the 
number of possible destinations or number of destination queues equals N-1). 

Per queue average access delay gets higher as the number of nodes transmitting on 
a given channel increases. This is because access contention increases. Consequently, 
it takes longer for the packet scheduler to complete its cycle (i.e., to return to a given 
queue). 



5 Concluding Remarks 

This paper focused on a WDM MAN architecture that has been designed to support 
packet switching in the optical domain. Unlike others, this architecture supports not 
only point-to-point communication, but also point-to-multipoint communication. This 
is an important feature in the support of next generation services. 

The network strives mainly for simplicity. Simplicity is fundamental in our 
network (even more than in conventional networks) because it leads to low processing 
delays. Fiber loops are directly proportional to processing delays of control slots. 
Thus, simplicity leads not only to scalability in terms of bit rate, but also to low cost 
due to the need for shorter fiber loops. 

Simulation activities showed that the network provides high performance. 
Certainly, channel utilization is not the strength of the architecture, specially when 
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W=N. However, MANs may contain up to a couple of hundred nodes while the 
number of wavelengths is likely to be smaller. Furthermore, under certain traffic 
patterns (e.g., multicast), channel utilization improves. Finally, bandwidth is expected 
to be abundant and cheap and, therefore, channel utilization at the levels showed in 
this paper may not be an issue. 
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Abstract. The network service providers require cost-effective multi-service 
platforms that can meet their customer’s diverse, dynamic, and demanding 
application requirements. Multi-service platforms require flexible, configurable, 
versatile, scalable, and multi-purpose VLSI solutions. The emerging ASIC 
solutions, for these applications, are appropriately termed Network Processors 
(NPs) or “systems on a chip.” Many of the emerging NPs are limited to only 
processing cell/packet-based traffic with functionalities distributed over several 
chip components. Onex’s intelligent TDM, ATM, Packet (iTAP™) system can 
support TDM-based and cell/packet-based traffic on only two chip components. 
In this article, we provide a short background on the network processors 
followed by an overview of iTAP™’s system architecture. The iTAP™’s 
distributed dynamic bandwidth allocation mechanism will be described. A 
simulation model of some of the algorithms implemented in the iTAP™ is also 
provided. Finally, we will state the concluding remarks and elaborate on the 
issues that require further investigation. 



1 Introduction 

The explosive growth of the Internet traffic, the proliferation of the optical 
communications technologies, and the emergence of packet switching technologies 
(e.g., IP, ATM, and MPLS) have created new opportunities and demand for 
integrated, scalable, and easily configurable Network Processors (NPs) solutions 
enabling the next generation optical network platforms to provide integrated services. 
The legacy metro and access optical communications platforms are designed relying 
on custom-made ASICs (Application Specific Integrated Circuits) which are limited 
to a specific set of protocols or applications. For example, they typically only support 
ATM, or IP switching applications. Even, the emerging multi-service platforms have 
to rely on multiple native-mode processing elements and switching fabrics in order to 
support IP, ATM, and TDM services. These approaches lead to “fork-lift” upgrade 
scenarios which are not desired by the service providers. The service providers (e.g., 
CLECs and ILECs) require solutions that are highly scalable, configurable, 
extensible, and cost-effective. 

The current philosophy towards the design and development of the network 
processors (i.e., system on a chip) is to design unified ASICs that can integrate all the 
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functions of the packet switching and circuit switching capabilities into as few chips 
as possible. Some of these functions include framing, mapping, traffic shaping, 
classification, routing, prioritization, queueing, and scheduling algorithms. The Onex 
iTAP™ system consists of two components, the service processor (SP) and the switch 
element (SE), respectively, where most of the traffic processing functions listed above 
have been implemented on the SP element. In this article, we provide some 
background on the requirements for Network Processors. Next, we discuss 
algorithmic and architectural issues specific to the Onex’s iTAP™ system. Some 
performance studies of the iTAP™ chipset architecture are also reported. 



2 Background 

In the past decade, the research community has conducted numerous studies and 
achieved significant results in the formulation of sophisticated algorithms executing 
on-the-fly forwarding, classification, and scheduling techniques capable of enforcing 
quality of service (QoS) while maintaing high scalability. Switches that can scale to 
large capacities have a highly distributed architecture. Most of the functionalities, in 
these systems, are located in the ingress and egress port cards. Ease of implementation 
and cost-effectiveness of the distributed architectures have lead to a series of highly- 
flexible and highly-scalable multi-stage, multimodule switch sets. The ATLANTA 
architecture, [1], with its memory/space/memory (MSM) arrangement brought the 
first commercial prototype of this type of architecture. The ATLANTA chip set was 
designed with the objective in mind that the dominant protocol of choice was ATM. 
The paradigm shift to IP and MPLS as well as the continuous need for the TDM- 
based services have created a demand for unified ASICs that can support all types of 
services. 

Simultaneous support of TDM-based and packet-based traffic on the same 
Network Processor can be a complex task since the TDM-based traffic can not be 
queued and requires uninterrupted service through the switch fabric. The packet-based 
traffic can be queued and processed by various forwarding functions, however, a 
mechanism must be in place to adapt to changing bandwidth requirements without 
disturbing the TDM-based traffic. The packet arbitration mechanism developed for 
the iTAP™ system is designed for this purpose. 

The packet arbitration mechanism and the scheduling schemes in the line cards 
(ingress and egress) assure that packet traffic receive adequate resources while 
meeting the required QoS guarantees. The purpose of the study in this paper is to 
investigate performance issues related to these schemes and the interaction between 
the arbitration and scheduling mechanisms. In the following, we will briefly review 
some of the packet processing functions performed in the iTAP system. We also 
highlight several packet scheduling techniques used here. A detailed description of 
the iTAP™ packet arbitration mechanism will be provided in section 4. 



2.1 Packet Processing Functions 

In this section, we use packet processing as a generic term that describes different 
actions applied to a packet once it enters and exits a Network Processor. 
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2.1.1 Traffic Forwarding 

Traffic forwarding function performs layer-3 IP packet forwarding. By forwarding, it 
is implied the process by which a routing database is consulted to determine which 
output link, a single packet should be forwarded on. This process of database 
consultation must be extremely fast to keep up with the rate of arrival of packets. 

2.1.2 Traffic Classification 

IP enhanced services such as differentiated services (DiffServ), fine grain quality of 
services, virtual private network (VPN), distributed firewalls may require different 
treatment as they enter the IP network. These services need packet classification, 
which is the process of determining which flow a packet belongs to, based on one or 
more fields in the packet header. In the packet classification process, header fields 
that can be used consist of the destination and sources IP addresses, the protocol type, 
and the source and destination port numbers. By specifying valid ranges for any of the 
header fields, different rules for classification can be established. The iTAP™ service 
processor using intelligent classification algorithms can easily support this feature at 
OC48 link rates, [6] and [7]. 

2.1.3 Traffic Policing and Congestion Management 

Traffic policing is the process by which incoming traffic is examined to determine 
whether or not it conforms to the negotiated traffic contract. During this process those 
packets that have violated the traffic contract must be appropriately identified, marked 
or dropped. Congestion management is the process in which the nonconforming 
traffic is dropped. The dropping process can be achieved in a way that allows the user 
applications to adapt their transmission rates to network congestion conditions. For 
the congestion management function, the iTAP™ system implements variations of the 
weighted random early detection (WRED), [8], scheme to alleviate possible 
congestions. 

2.1.4 Packet Scheduling 

Packet scheduling is the process of time-stamping packets according to their priority 
for departure on the output link. There are several packet schedulers, these are 
described in the following sections. 

2.1.4.1 Weighted Round Rohin 

Weighted round robin (WRR) is a simple scheduling scheme with very low 
implementation complexity that enforces accurate bandwidth guarantees at the 
expense of excessive delay bounds. WRR is used for scheduling request elements for 
transmission of packet data units (PDUs) through the switch fabric and the egress 
service processor. More discussion on this scheduler will follow later. 

2.1.4.2 Generalized Processor Sharing 

A generalized service processor (GPS), [2], is a service mechanism with the following 
attributes: 



Minimum bandwidth guarantees. 

Deterministic end-to-end delay bounds. 

Fairness in the distribution of service among different flows 
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Since GPS is an idealized discipline that does not transmit packets as entities, a 
packet-hy-packet approximation to the transmission scheme is needed. A weighted 
fair queueing (WFQ) scheduler can be used to emulate the bit-by-bit GPS. Several 
variations of weighted fair queueing have been investigated and implemented over the 
past decade [3], [4], and [5]. A computationaly efficient variation of WFQ which 
achieves fairness regardless of variation in server capacity is called start-time fair 
queueing (SFQ), [3]. 




3 System Architecture 

Figure 1, illustrates a high-level architecture of the Onex iTAP™ system. This system 
consists of two components, the service processor (SP) and the switch element (SE), 
respectively. The service processor is designed to SONET OC-12 and OC-48 
interfaces as well as UTOPIA level 3 for ATM and POS and can be extended to 
higher (e.g., 10 G.) The switch element is capable of supporting up to 12 ingress and 
12 egress service processors which can aggregate to 30 Gbps of throughput and up to 
10 Tbps if a multi-stage configuration is used. 

The service processor, once properly configured, can simultaneously process 
packet-based and TDM-based traffic. The provisioned TDM traffic flows through the 
switch element without being disturbed by the statistically multiplexed data traffic. 
The packet-based traffic (e.g., IP, MPLS, ATM), on the other hand, must go through 
several stages of packet processing. Upon the arrival of data traffic in the ingress 
service processor, the packet/cell headers are extracted for further processing while 
the payload is queued up to be scheduled through the switch element and the egress 
service processor. The first processing phase for packet/cell is layer 3 IP address 
lookup/ layer 2 label/VCI/VPl table lookup. Immediately after that, the classification 
process determines which flow a packet belongs to based on one or more fields in the 
packet header. 

The policing and congestion management processes take place immediately where 
IP packets and ATM cells are usage parameter controlled for conformance checking. 
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If for any reason traffic contracts are determined to be violated, the policer must 
enforce appropriate actions which include tagging of the violating traffic. The 
responsibility of the congestion management process is to enforce the traffic contract 
rules under which the violating PDUs are discarded using WRED scheme. 



Switch Fabric 

Service Service Processer 




la RE 
n=3 TDM 

H DATA 



Row time 

► 

Reverse 

b.D D .IQ ... . J., , , I 

Fig. 1. High level architecture of iTAP™ network processors and switch fabric. 



All of the received traffic is destined for the Switch fabric and is mapped into 
Onex’s proprietary row format, see Figure 2. This row consists of 1700 slots of length 
36 hits each. Each slot carries 4 hytes of data and 4 bits of control information. TDM- 
based data is allocated dedicated bandwidth through the switch fabric and the Onex 
proprietary row format is designed to optimally support TDM traffic down to the VT- 
1.5 and VC- 12 level. Incoming TDM traffic is never buffered, it is routed directly to 
pre-allocated and pre-configured slots in the outgoing rows. TDM data may be 
switched through the SE with a finest granularity of 1 slot per row. This means that a 
TDM stream may be switched from any slot on any input link to any slot on any 
output link. ATM cells, IP Packets, MPLS packets, and PPP frames are not allocated 
dedicated bandwidth through the switch fabric. The bandwidth for data services must 
be arbitrated through the switch and the egress Service Processor. The row is 
designed to support a super-slot or data-slot which is an aggregation of 16 single slots. 
The switch row is serialized and distributed across high speed serial ports for 
transmission to the switch. 
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In the Transmit direction, the Service Processor responds to arbitration requests 
from a Switch chip. The arbitration grant from the Service Processor is based on the 
available buffer space in the traffic memory. TDM traffic and granted data traffic are 
received through the high speed switch interface. TDM traffic is mapped directly into 
outgoing SPEs. ATM Cells, and IP packets, MPLS packets, PPP frames are all 
buffered externally where they wait to be scheduled out a transmit SONET or 
UTOPIA interface. 



4 Arbitration Mechanism 

4.1 Distributed Dynamic Bandwidth Allocation 

Each service processor operates without any fore-knowledge of what the other service 
processors are doing. As a result, when they go to send their PDUs, they need to know 
two things, see Figure 3: 

• Does the egress service processor have room in its queues for this PDU? 

• Is there bandwidth in the chosen path to get the data from one end of the switch 
fabric to the other without packet loss? 

The arbitration mechanism will check both of these criteria and send back a grant 
message to the requesting service processor on a PDU by PDU basis. When a service 
processor has received a grant it knows for certain that the data will make it to the 
egress service processor (except during system failure). 

Each row time, the service processors will make a request for each PDU to be sent 
in the next row. There can be up to 96 request elements per row time, one element for 
each possible data PDU. These request elements will be multiplexed in with the data 
stream. The requests stream through the switch. Contention resolution is achieved via 
a priority based knockout mechanism. This is important since the 12 inputs could all 
converge on a single output, the outgoing link will not be able to handle the traffic 
presented to it. A small buffer pool exists in each output link to hold some of the 
requests when multiple requests come into the switch chip which are destined for the 
same output link. At the far end of the switch fabric, the service processor will make a 
decision to grant or deny a request based on the depth of its QoS queues. The service 
processor will then source a grant message which also travels through the switch 
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fabric, but in an out-of-band overlay network which goes in the opposite direction of 
the switch fabric. The grants will he written without regard to priority into a first-in- 
first-out (FIFO) queue and read in order of arrival time. 



Service Processer Service Processer 




Fig. 3. Queueing model of the iTAP system service processor and switch element. 



The arbitration mechanism will work as follows; 

• At the start of a row time the input service processor will begin sending its 

requests. A request element is a request for a single group’s worth of 

bandwidth in the switch fabric destined for a particular service processor, 

• The first stage in the switch fabric will look at each request from all 12 input 

links as well as the multicast and control message controller. The requests 
traverse the switch fabric by using a self routing tag which indicate the hop-by- 
hop output ports used at each stage of the switch fabric. At this time, the Stage 
1 hop-by-hop field will be replaced with the input port number that the request 
entered on, 

• The requests for each output link will be stored in a buffer pool. As long as 
buffers are free, requests will be stored. As soon as there are no free buffers, 
lower priority requests will be overwritten with higher priority ones. The 
request buffers will be able to support 12 input links all converging on a single 
output, meaning that 12 request elements can be written to the buffer pool 
every request time. Requests are evicted from the buffer pool based on priority 
and age. The youngest lowest priority requests will be dropped, and the highest 
priority oldest requests will be kept. 
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• The request buffers will be then read. After a request is read the request 
element deleted from the buffer, making room for another request element. 
Any requests still in the buffers will be dropped, 

• This process continues all the way to the end of the switch fabric and into the 
service processor. The service processor will make a decision to accept or 
reject the request based on the QOS field. Then, it will source a grant message. 
The grant message uses the modified self routing tag of the request element to 
traverse the switch fabric backwards using an overlay network. 

In section 5, we describe a simulation model of the scheduling and arbitration 
mechanisms used in the iTAP™ system. 



5 Simulation Model of iTAP System 

In this section, we briefly describe the model of the iTAP™ system constructed on the 
Opnet simulation platform. The primary purpose for this study is to observe the 
interactions among four major components of the iTAP™ system the packet 
scheduling on the ingress service processor, packet arbitration mechanism, the 
knockout priority queue mechanism in the switch element, and the packet scheduling 
on the egress service processor. The specific algorithms used for these disciplines are 
described below. 

The service processors include: 

• Weighted round robin (WRR) scheduler,[l], to schedule the packet-based 
traffic in the ingress Service Processor, 

• Weighted fair queueing (WFQ) scheduler, [2], to schedule packet-based traffic 
on the egress ports, 

• Egress port buffer management, 

• Packet arbitration mechanism. 

The switch element consists of: 

• A knockout priority queueing mechanism to prevent possible congestions 
when multiple ingress ports converge onto the same egress port, [4]. 

On the left side of Figure 4, a detailed description of the WRR scheduling for the 
request elements and their possible location with reference to a the row format are 
depicted The knockout priority buffer for the request elements in the switch element 
is shown on the right side of Figure 4. A view of the Opnet model of the iTAP™ 
system is illustrated on the left side of Figure 5. In this study a single-stage switch 
fabric is shown with interfaces to 12 ingress and 12 egress service processors. The 
buffer management thresholds at the output queues in the egress service processor is 
illustrated on the right side of Figure 5. 
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Fig. 4. Weighted round robin scheduling in the ingress service processor, knockout 
request element queue in the switch element. 



5.1 Highly Overloaded System 

In the following, we consider a scenario where the system is being observed where 
input generation rate is much larger than the service (drain) rate. In this experiment, 
three flows are configured to generate PDUs on average at 800 Mbps each according 
to a Poisson process. Each flow is assigned different QoS value where the QoS values 
determine the treatment each flow receives according to the their predetermined 
weight. These flows are originated from three different ingress ports and are destined 
to the same egress port, see Table 1 for more details. The service rate at the output 
port is set to 800 Mbps. In this highly overloaded scenario, we observe the 
performance of the system when the system is subject to congestion. Initially, the 
flows are assigned the same weights (e.g., WRR weights) at the ingress side, the same 
switch knockout priority, and different weights (e.g., weights at the egress side) at the 
egress side. The next experiment, different switch knockout priorities are used to 
allow the higher priority flows to move through the switch. In the third experiment, 
we assign different WRR weights in the ingress service processor. We expect to 
observe performance improvement for higher priority flows. Figure 6 illustrates time 
average of flow throughput for three flows described above. 
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Fig. 5. Network view of the iTAP™ system model in Opnet simulation platform. 



Table 1. Flow configuration for the simulation experiment. 



Number of 
Flows 


Generation 

Rate/Flow 


Drain Rate 


Queue 

Thresholds 


QoS For 
Low Thresh 


3 


800 Mbps 


800 Mbps 


(200,500) PDUs 


>1 



As shown in Figure 6(a), three curves represent the throughput behavior for three 
flows. The two high QoS flows number 0 and 1 receive larger share of resources (e.g., 
bandwidth at the egress port) as compared to flow 2. The abrupt reduction of the 
throughput for flow number 2 is due to the buffer management threshold used in the 
output queue. In Figure 6 (b), as different knockout priorities are used in the switch 
element, we can readily observe the its effect on flow number 0 improving its 
performance since the request elements for flow number 1 and 2 are knocked out 
more often by the switch element. Finally, when different WRR weights are assigned 
to the test flows, the flow (e.g., flow number 0) with the highest weight can receive 
better performance, see Figure 6(c). It is also interesting to observe that the aggregate 
throughput of the three test flows sums to 800 Mbps which is equal to the egress side 
drain rate. 
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Fig. 6. Time average of flow throughput for three flows with SFQ, knockout priorities and 
WRR. a) SWFQ only, h) Knockout priority, and SWFQ, c) WRR, Knockout priority and 
SWFQ. 



6 Conclusions 

In this article, we presented the architecture for the Onex’s iTAP™ Network 
Processor. These processors can provide packet-based and TDM-based functions 
suitable for multi-services platforms. An arbitration algorithm used for packet-based 
traffic is stated. An event-driven simulation model of the scheduling and arbitration 
algorithms were developed on the Opnet environment. The preliminary results 
indicate that the mechanisms built to handle data services in conjunction with TDM 
services achieve desirable performance in terms of user throughput and end-to-end 
delay bounds. 
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Abstract. The gool of this paper is the traffic management in the multi- 
service optical network context (ROM). We suggest centralising the traf- 
fic management policies at the interface between the client layers and the 
ROM network. The key idea is then to exploit, in an optimal way, the 
electronic memories in the electronic interfaces at the edge nodes of the 
optical network to control the incoming traffic. 

We particularly study the impact of traffic shaping at the ROM periphery 
on the end-to-end performance in terms of loss and delay. 



1 Preliminaries and Problem Relevance 

With a world wide coverage giving access to a large range of data banks and 
a massive introduction of PCs in house capable to handle IP connections, it is 
foreseen that interactive applications are identified as a future important market. 
Well understood by the IP community, efforts are currently devoted to propose 
new techniques adapted to IP to offer a new Quality of Service (QoS). In ad- 
dition, incumbent and emerging carriers have adopted a mix of Sonet/SDH, 
ATM, and IP. There is a clear agreement that the optimal solution delivers effi- 
ciently voice and data transport; is scalable, flexible, cost-effective, and reliable; 
while offering QoS. Optics has been identified as a technology capable to provide 
large capacity of transport in point-to-point transmission systems. Optical cross- 
connects have been studied for several years to propose an all-optical architecture 
capable to manage optical wavelength channels. However, the sporadic nature of 
the Internet traffic pushes equipment constructors to envisage all-optical packet 
switching networks to provide the flexibility required together with the capacity 
P. Thus, the objective of French ROM projectQ, is to demonstrate the fea- 
sibility of a multi-service optical network where optics provide the underlying 
network architecture, compatible with a QoS requested by client layers at the 

^ Short for “Reseau Optique Multiservice” . ROM is partially supported by the “Reseau 
National de Recherche en Telecommunications” under the decision number 99S 0201- 
0204 
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periphery of the considered optical layer |2| . The lack of optical memories and 
the need to limit the use of electronic memories argues in favour of the physical 
resource naturally offered by optics: the wavelength dimension. In the context of 
broadband networks, with bandwidth up to or larger than ITbit/s, the interface 
between the electronic and optical sub-networks poses challenging issues. This 
interface shall conciliate and interconnect these two worlds. This paper focuses 
on end-to-end traffic engineering as a means to end-to-end traffic management 
between the client layer and the ROM network in the presence of different traf- 
fic types with different QoS requirements. We particularly study the impact of 
traffic shaping at the ROM periphery on the end-to-end performance in terms of 
loss and delay. The self-similar nature of Internet traffic has been demonstrated 
by several measurements and statistical studies [S|. It has a direct impact on 
network dimensioning; in particular buffer sizing is crucial. On one hand, the 
buffer size should be large enough to absorb the very long traffic bursts intro- 
duced by self-similar characteristics; on the other hand, the size should not be so 
large as to introduce unacceptable delays. In this paper, the proposed solution 
when designing the network is to increase the buffer size at the admission points 
in order to smooth out the peaks and valleys; and to dimension the optical links 
and memories so that the optical network can operate at high peak-to-average 
load. Thus, considering the intrinsic characteristics of the IP-based traffic, such 
as the asymmetric traffic ffow, the traffic burstiness, and the self-similar or frac- 
tal nature of the traffic statistics, the Electro-optical interface has to include 
several traffic management techniques. These are mainly packets buffering with 
possible priority schemes, congestion control, shaping and policing. 

In order to optimise the logical performance at the network level, and due to 
the lack of optical memories, the distributed memory is, in many aspects, the 
only solution. The key idea is then to exploit, in an optimal way, the electronic 
memories in the electronic interfaces at the edge nodes of the optical network 
to classify the incoming traffic, to wait for a fulfilling of payloads in order to 
optimise the transport, to reshape the traffic profile and to regulate the traffic 
in case of strong contention localised in the network. 



2 Trends of Optical Multi-service Network 
Relevant to Our Study 

2.1 Client Network 

Subscribers ask for high QoS at low costs and offer a traffic mix consisting of data, 
video, and audio. This pushes network providers to provision an optical multi- 
service network at the core of electronic-based customer LANs (see Figure [Q. 
At the electronic functional layer, traffic aggregation, packet routing, and traffic 
policing (access control at the interface level) are performed. At the internal, 
optical layer, the focus is on transmission and low-layer switching functions. 
The edge switches are located at the boundary between these two layers. 
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OS-ROM: Optical Switch of 

Multisrevice Optical Network 
l-ROM : Interface of 

Multisrevice Optical Network 



Fig. 1. The multi-service optical network structure 



2.2 CoS/QoS and Optical Bandwidth Allocation 

User applications have different characteristics and require different QoS levels. 
In this context the network should be able to offer a CoS/QoS to differentiate 
between the different application requirements. In this work, we consider three 
QoS levels: 

— QoS level 1, for strictly real-time services, with very low packet loss rate 
(< 10“®), strictly bounded delay and without loss of sequence in the optical 
nodes. 

— QoS level 2, for less strict real-time and priority non-real time services, with 
low packet loss rate (e.g.< 10“®), and without loss of sequence. 

— QoS level 3, best effort for non-real time services, with a packet loss rate 
monitored by the client layer protocol (IP in this study), without control of 
delay and packet sequence. 

The most efficient management of the bandwidth can be achieved by using long 
packets covering all the available wavelength channels This way, classical 
cross-connected optical transport network migrates towards a pure optical packet 
network. 



2.3 Interfacing Issues 

Very often, customers produce IP flows over various underlying infrastructures, 
such as ATM, FR, etc. At the optical interface, the interoperability issue can be 
simplified by considering the IP layer only and not the lower links. IP packets 
only are to be processed and IP QoS paradigm applies. The IP/optical interface 
brings challenging issues, mainly with respect to bringing together the electronic 
and optical technologies. The first issue concerns the bandwidth adaptation. 
Today, classical LAN solutions carry traffic in the range of 100 Mbit/s to 10 
Gbit/s. On the other hand, on a single optical fibre, bandwidth is available under 
the form of separate wavelengths, each of which able to carry at least 2.5 Gbit/s 
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and up to 40 Gbit/s. The second issue is related to the link and the network layer 
functions. Classical network architectures rely on well-known traffic management 
techniques: packet buffering in the network node, with a possible priority service 
scheduling and congestion management. The all optical core is unable to offer 
the same traffic management capabilities, due to the limited optical buffer sizes. 

2.4 Format of Optical Packet 

We consider packets of fixed length, as shown in Figure |2l which opens the way 
to shaping techniques, able to dramatically reduce the burstiness level of a traffic 
profile. With a better traffic profile, the contention resolution in the optical nodes 
is relaxed and paves the way to the introduction of all-optical packet switching 
nodes while offering a high level of QoS. The structure of the optical packet is 




Fig. 2. Structure of the optical packet 



composed of three main sub-blocs: 

— Header - includes source address, destination address, priority, number of 
jumps, HEC, delineation, synchronisation, 

— Payload - includes some bits as a preamble to identify the position of the 
payload and to ease the packet jitter extraction, 

— Guard bands - inserted to help the switch read the relevant packet informa- 
tion while coping with length variations, thermal and chromatic dispersion 
effects etc. 

3 Model of Study 

3.1 Description 

The end-to-end model is shown in Figure El The external part, at the customer 
side, consists of a set of traffic sources with different types of traffic. These 
are real-time voice and video conference, video-on-demand and classical data 
(WWW, FTP ...), corresponding to the three levels of QoS, respectively, and are 
to be differentiated according to those levels of priority. We investigate in this 
work the importance of traffic shaping to improve performance in the core of the 
optical network. We specifically target acceptable and feasible sizes of fibre delay 
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lines. As video-on-demand and classical data traffic have a self-similar nature, 
upon aggregation, the long-range dependence of this self-similar traffic tends to 
be even more apparent. This type of traffic when offered to an optical network 
which typically have little provision for buffering results in a bad performance 
at the optical level. The remedy consists in shaping the traffic at the interface, 
which is outside of the optical functional area, where memory comes cheap. 
Hence the novelty of the model we suggest lies in shaping of the incoming flows 
as they arrive from the customer and in decentralising the traffic management 
and conditioning at the periphery outside of the optical network. The latter is 
unable to perform those vital tasks but is solely capable of fast transmission. This 
done, incoming electronic packets are assembled in optical packets at the Electro- 
optical interface. Let us recall that in the ROM project, the edge node, where 
the interface lies, is modelled by a set of control mechanisms: i) conditioning and 
shaping of incoming traffic from the client layer, ii) differentiation, classification 
and buffering of the different flows into separate buffers according to their level of 
QoS, iii) priority scheduling mechanism, for instance using a head of line scheme, 
to take into account the different levels of priority, iv) filling of optical packets 
by incoming of incoming variable size IP packets which may be segmented if 
need be. The ROM edge node studied in this paper is a metropolitan optical 
node with 8x8 port and 4 X's per port, each wavelength carries 10 Gbit/s. The 
fibre delay lines associated to each output wavelengths are dimensioned as well 
as the electronic buffers implemented at the interface level. 



3.2 System Parameters 

The self-similar traffic parameters are illustrated by table d This values cor- 
responds to mean throughput, standard deviation, Hurst parameter and time 
scales used for synthesising self-similar sources behaviour for video conference, 
video-on-demand and classical data. Voice sources are modelled by ON/OFF 
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process with 0.4s exponentially distributed ON duration, and 0.6s exponentially 
distributed OFF duration. Each voice source emits at 32Kbit/s. Table |2| shows 
packet sizes and allocated bandwidth for each CoS considered in our model. 



Table 1. Self-similar parameters for each type of traffic 



Type of traffic 


video conference 


video-on-demand 


data 


mean 


181 Kbit/s 


2 Mbit/s 


200Mbit/s 


standard deviation (a) 


35.10® 


10® 


10'^ 


H (Hurst) 


0.51 


0.8 


0.9 


time scale 


3 


5 


6 



Table 2. Bandwidth rate allocated to the 3 CoS considered 



Classe of Service 


CoSl 


CoS2 


CoS3 


Packets size 


160 octets 


256 octets 


1500 octets 


allocated bandwidth 


727 Mbit/s 


5.2 Gbit/s 


13 Gbit/s 


Sources number 


19452 


2600 


65 


Percentage/total bandwidth 


3.84 % 


27.49 % 


68.68 % 



4 Results 

Next, we have simulated the above-mentioned model. We investigate two sce- 
narios, with and without shaping of video-on-demand and classical data traffic. 
We aim to study the dimensioning of the fibre delay lines subject to loss rate 
constraints and the trade off between shaping parameters, in terms of buffer size 
and shaping rate, and the sizes of the fibre delay lines necessary to meet the 
given level of performance. 

The first issue is the filling of optical packets with incoming electronic ones 
is not always straightforward. Optical packets need to be filled with packets 
belonging to the same QoS level as the optical transmission cannot distinguish 
between payloads of the optical packets. In case of large incoming packets, we can 
simply segment them. However, if not enough packets are ready to be transmitted 
on the optical side, these results in under filled optical packets with in turn results 
in under utilised optical network. 

We hence investigate the optical bandwidth utilisation versus the electronic 
packet size, taking into account the cost of packet segmentation at the destina- 
tion level. 

The second issue deals with the shaping of incoming traffic at the Electro- 
optical interface. 
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4.1 Impact of Shaping on Performance 

Figure^shows the probability density function with respect to number of optical 
packets on one fibre delay line for the cases of shaping and no shaping. Both 
curves decrease as the number of optical packets increase showing higher density 
at the lightly loaded side of the fibre delay line. The no shaping curve however 
shows an extremely high density even at the 20 packet level which is beyond 
feasible cases for fibre delay lines. This shows that without shaping, our loss rate 
commitments cannot be met. In the case of shaping, the curve decreases more 
rapidly than in the no shaping case reaching a value of 10“® at almost 7 optical 
packets at the fibre delay line. This shows that shaping only helps achieve the 
desired performance level that meets the QoS constraint 





Fig. 4. Probability Density Function 
of one output fibre delay line 



Fig. 5. Probability Density Function 
of buffer after shaper: buffer2 



Moreover, shaping the data traffic maintains the end-to-end delay in the 
network while contributing largely to the improvement of the logical performance 
in terms of packet loss, as shown in Q. Continuing with dimensioning, we next 
investigate the sizes of the buffer just after the shaper at the Electro-optical 
interface. Let us recall that those buffers are used to differentiate and classify 
flows according to their QoS levels. We again plot the probability density function 
with respect to number of electronic packets on each buffer. Figure 0 shows the 
probability density function with respect of the number of packets on buffer 2, 
assigned to video on demand traffic with medium QoS and thus priority. The no 
shaping curve shows high density at the 22 packet level, the point at which the 
loss rate will be not less than 10“"^, we need about 40 packets buffer space to reach 
our loss commitment. In the case of shaping, the curve decreases more rapidly 
than the no shaping case reaching the value 10“® loss rate at 22 packets at buffer. 
This improvement of performance comes by implementing buffer shaping at the 
interface level. In our study, we have considered 2000 packets shaping buffer size 
for CoS2 traffic (referred by shaping 1 in figure EJ and 1000 packets shaping 
buffer size for CoS3 traffic (referred by shaping 2 in figure EJ. The observed loss 
in shaping 1 (respectively shaping 2) is 10“^ (respectively 10“^). 

We have next measured mean delays at the optical node and the end-to-end. 
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The obtained results are shown in table |2l The traffic shaping improve clearly 
delay at the optical level for different CoS. 



Table 3. Mean delays 



Mean delay 


at the opti 

without shaping 


cal node 

with shaping 


end-to- 

without shaping 


end 

with shaping 


CoSl 


800 ns 


516 ns 


1.310 ms 


2.11 ms 


CoS2 


770 ns 


513 ns 


1.87 ms 


2.84 ms 


CoS3 


1380 ns 


569 ns 


3.48 ms 


7.9 ms 



The confident level associated to the delay values is 95%. In the case of 
shaping, we have obtained between 0.4% and 0.8% for different CoS. But the 
no shaping case gives larger confident intervals (around 30%) because of high 
variation of self-similar traffic in different large time scale. 

4.2 Optical Bandwidth Utilisation 

In the above study, we have scheduled the electronic packets at the optical ac- 
cess according to HoL (Head of Line) policy. Every time slot, we fill the available 
optical packets with CoSl, CoS2 and CoS3 respecting the priority. In this sec- 
tion, we focus on the comparison between this resource allocation mechanism, 
which will be refered as total flexibility mechanism, and another one partial 
flexibility. 

Partial flexibility mechanism consists on dedicating Ai only to CoSl traffic. 
CoS2 et CoS3 will share A2, A3 and A4 with respect of priority. The aim is to 
investigate the performance at the optical node level for both partial flexibil- 
ity and total flexibility. One way to accomplish this task is to vary the CoSl 
traffic percentage of the total traffic and compare the performance of these two 
mechanisms . We consider the same system model studied above assuming that 
all traffics are already shaped before acceding to the optical network and we vary 
the CoSl traffic percentage from 6% to 66%. Figure ElandQshow the probability 
Density Function of fibre delay line 1 for 6% CoSl and 66% CoSl respectively. 
Partial flexibility gives better performance, in terms of delay line 1 occupancy, 
than total flexibility in the 6% CoSl. This small percentage can not use all 
the bandwidth offered by Ai, but when the CoSl percentage get large (66%), Ai 
is not enough sufficient to transport CoSl and thus the associated delay line be- 
comes saturated. Figure El confirms that, by showing the delay line 1 occupancy 
evolution versus the CoSl percentage. Next, we show the remaining fibre delay 
lines performance associated to A2, A3 and A4. Figure |01 illustrates the delay line 
2 occupancy in the Partial flexibility case. It is clearly shown by this plot that 
the delay line 2 is almost saturated for 6%, 11% and 15% CoSl. This is due 
to the high corresponding percentage of CoS2 and CoS3 traffic which can not 
access to the bandwidth offered by Ai. 
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Fig. 6. Probability Density Function of Fig. 7. Probability Density Function of 
one output fibre delay line 1 : 6% CoSl one output fibre delay line 1 : 66% CoSl 
traffic traffic 





Fig. 8. Probability Density Function of Fig. 9. Probability Density Function of 
one output fibre delay line 1 in the case one output fibre delay line 2 in the case 
of partial flexibility of partial flexibility 



Figure uni shows optical packet fill ratio vesus CoSl traffic percentage ob- 
tained by the both resource allocation mechanisms considered. Partial flex- 
ibility gives better utilisation comparing with total flexibility in all cases, 
especially when CoSl percentage becomes large (>66%). This can be explained 
by the different electronic packet sizes associated to each CoS. By increasing 
CoSl percentage, we increase the number of small packets (160 Bytes), and thus 
the optical slots fill ratio is more efficient. 

5 Conclusion 

We consider in this paper a multiservice-optical network where optics provide 
the underlying network architecture with a QoS requested by client layers at the 
periphery of optical network. We focus on end-to-end traffic management be- 
tween the client layer and the ROM network in the presence of different traffic 
QoS requirements. Due to the lack of optical memories, the key idea is then to 
exploit, in an optimal way, the electronic memories in the electronic interface 
at the edge nodes of our optical network. Results have shown that shaping of 
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Fig. 10. Optical packet utilisation efficiency Vs CoSl traffic percentage 

data traffic eases the contention resolution in the core nodes by reducing the 
burstiness level of the traffic profile and by enabling the exploitation of optical 
resources, in terms of time, space and spectral techniques, in an efficient man- 
ner. There is clearly a trade-off to analyse between shaping parameters and delay 
line sizes to rich a target performance in terms of loss and delay. By simulation, 
we have obtained suitable values of fibre delay lines capacities using shaping 
mechanism at the periphery of ROM network. We have also studied the effi- 
ciency of optical packet utilisation by comparing two optical resource allocation 
mechanisms, namely total flexibility and partial flexibility. 
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Abstract. In this paper, we present a comparison of the blocking performance 
in wavelength routed optical networks with chordal ring and mesh-torus 
topologies. The comparison is focused on chordal rings with a chord length of 

-/iv -1-3, being N the number of nodes, since this chord length leads to the 
smallest network diameter. This performance comparison revealed an important 
feature of chordal rings: very small blocking gains were observed, due to the 
increase of the node degree from 3 (chordal ring with a chord length of 
^fN -1-3) to 4 (mesh-torus). The comparison is made for networks with 100 and 
1600 nodes. The influence of wavelength interchange on these small gains is 
also investigated: the node degree gain is very small and increases slightly as 
the converter density increases. Thus, if a small blocking performance 
degradation is allowed, the choice of a chordal ring with a chord length of 

-/iv -1-3, instead of a mesh-torus, leads to a reduction in the number of network 
links, and hence in the total cable length, since the number of links in a A-node 
chordal ring is 3N, and the number of links in a A-node mesh-torus is 4A. 



1 Introduction 

IP (Internet Protocol) networks based on WDM (Wavelength Division Multiplexing) 
are expected to offer an infrastructure for the next generation Internet [l]-[2]. 
However, up to now, WDM has been used to satisfy the bandwidth requirements 
imposed by the traffic growth. Actually, the worldwide deployment of WDM systems 
is seen as the first phase of optical networking. After this phase, the introduction of 
optical add/drop multiplexers in a linear architecture and the use of WDM protection 
switches are expected. This architecture will rapidly evolve to a WDM ring 
architecture, and a further possible evolution scenario may be the interconnection of 
WDM rings and mesh networks [3]. Whereas the evolution from the point-to-point 
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WDM transmission system to interconnected rings is clear from a physical topology 
point of view, the optimal topology to be used for the mesh network is a subject less 
studied. In [4], a study is presented of the influence of node degree on the fiber 
length, capacity utilization, and average and maximum path lengths of wavelength 
routed mesh networks. It is shown that average node degrees varying between 3 and 
4.5 are of particular interest. 

Here, we consider chordal rings, which are a particular family of regular graphs of 
degree 3 [5]. A chordal ring is basically a ring network, in which each node has an 
additional link, called a chord. The number of nodes in a chordal ring is assumed to 
be even, and nodes are indexed 0, 1, 2, N-1 around the Wnode ring. It is also 

assumed that each odd-numbered node i (1=1, 3, ..., N-1) is connected to a node 
(i-i-w)mod N, where w is the chord length, which is assumed to be positive odd and, 
without loss of generality, we also assume that w < M2, as in [5]. In this paper, we 
compare the blocking performance of chordal ring networks with a chord length of 

J~N h- 3, being N the number of nodes, with the performance of mesh-torus networks 
which have a node degree of 4. 

For a given number of nodes N, different chordal rings can be obtained by 
changing the chord length. Fig. 1 shows a chordal ring with N=20 and w=l. Fig. 2 
shows a mesh-torus network with 16 nodes. 

The remainder of this paper is organized as follows. The analytical model used to 
compute the blocking probability in wavelength routed optical networks is briefly 
described in section 2. The performance comparison of chordal ring and mesh-torus 
networks is presented in section 3. Main conclusions are presented in section 4. 



2 Evaluation of Blocking Probability 

To compute the blocking probability in optical networks with wavelength 
interchange, we have used the model given in [6], since it applies to ring topologies, 
has a moderate computational complexity, and takes into account dynamic traffic and 
the correlation between the wavelengths used on successive links of a multi-link path. 
Moreover, this model is suitable for the study of the influence of wavelength 
interchange on the network performance. 

The following assumptions are used in the model [6]: 1) Call requests arrive at 
each node according to a Poisson process with rate X, with each call equally likely to 
be destined to any of the remaining nodes; 2) Call holding time is exponentially 
distributed with mean 1/|T, and the offered load per node is p=A./|T; 3) The path used 
by a call is chosen according to a pre-specified criterion (e.g. random selection of a 
shortest path), and does not depend on the state of the links that make up a path; the 
call is blocked if the chosen path can not accommodate it; alternate path routing is not 
allowed; 4) The number of wavelengths per link, F, is the same on all links; each 
node is capable of transmitting and receiving on any of the F wavelengths; each call 
requires a full wavelength on each link it traverses; 5) Wavelengths are assigned to a 
session randomly from the set of free wavelengths on the associated path. 
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In addition to the above assumptions, it is assumed in [6] that, given the loads on 
links 1, 2, ..., i-1, the load on link i of a path depends only on the load on link i-l 
(Markovian correlation model). 

The analysis presented in [6] also assumes that the hop-length distribution is 
known, as well as the arrival rates of calls at a link that continue, and those that do 
not, to the next link of a path. The call arrival rates at links have been estimated from 
the arrival rates of calls to nodes, as in [6]. The hop-length distribution is a function 
of the network topology and the routing algorithm, and is easily determined for most 
regular topologies with the shortest-path algorithm. For the bi-directional A^-node 
mesh-torus network, the hop-length distribution was easy to find. However, for the 
bi-directional N-node chordal ring with chord length w, it was not possible to obtain a 
general expression for the hop-length distribution. We have found analytical 
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expressions for the hop-length distrihution when the chord length is 3 (w=3), when 
the chord length is maximum {w=N!2 or w=NI2-\), when the chord length is as close 
as possible to the mean chord length (tv=M4), and for a chord length that leads to the 

smallest network diameter {w=4n h-3). In this paper, we concentrate on the latter 

chord length (w= ^J~N h-3). For this chord length, the hop-length distribution, p;, with 

N=m^ and m=\0+2k (A:=0,1, 2, 3,...), is given by: 



PI 



31 

N-l’ 



in 

for\<l< — + \ 



2m +6—1 
N-\ ’ 



for 



h2<Z<m — 4 

2 



and 



Af >144 




N-\ 



for I = m — 3 



13 

N-l’ 

4 

N-l’ 



for I = m — 2 
for I = m — l 



( 1 ) 



It can be shown that the average hop length, H ^. , of a chordal ring with N=m^ 

nodes, being m=\Q+2k (A:=0,1, 2, 3,...) and with a chord length of w=J~N h-3, is 
given by: 

- _ iVH-59Viv -144_ -14AfmH-45A^H-374Viv -288 (2) 

~ 2N-2 24A1-24 ' 



For a M-node mesh-torus network, with N = m^, m>4 and m even, the hop-length 
distribution, p;, is given by: 



Pt 



41 

N-l’ 



for 1<1<—-1 
2 



2m — 2 
N-l ’ 



for 



I 



m 

2 



4m — 4/ 
A ^-1 ’ 



for 



w 

— + l<l<m-l 
2 



, for I = m 

N-l 



(3) 



The average hop length, , for a A^-node mesh-torus network, with N = m^, m>4 
and m even, is given by: 
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'”“2(yv-i) ■ 

The performance assessment presented in this paper is based on the evaluation of 
the path blocking probability. However, a direct comparison of blocking probabilities 
is sometimes preferred in order to put in evidence some features and to quantify 
benefits. Besides the blocking probability, we consider the blocking gain due to an 
increase in the node degree (G„^). Since we consider only blocking gains along the 
paper, is also called here as the node degree gain. This metric is defined as: 

^ _ P(D-l) (5) 

^nd ~ „ ’ 

where ^’(D-l) is the blocking probability in a network with a node degree D-\, and 
Pq is the blocking probability in a network with a node degree D (both obtained for 
the same number of nodes, wavelengths per link, and load per node). 



3 Performance Comparison 

In this section we compare the blocking performance of mesh-torus networks and of 
chordal ring networks with a chord length of -Jn h-3, being N the number of nodes. 

Fig. 3 shows the blocking gain due to an increase of the node degree from 3 to 4, 
as a function of the load per node, for chordal ring and mesh-torus networks, both 
with 100 nodes and without wavelength interchange. For w=M4, gains of the order of 
102 and 105 were obtained for 8 and 16 wavelengths per link, respectively, with a 
load per node of 0.1 Erlang. For the case of w=4n h-3, very small gains were 
observed. As the load per node decreases from 5 Erlang to 0.01 Erlang, the variation 
of the node degree gain remains within one order of magnitude for the numbers of 
wavelengths per link considered: 4, 8, 12 and 16. In the following we consider 

chordal rings with chord lengths of only -Jn h-3. 

Fig. 4 shows the blocking probability versus converter density for chordal ring and 
mesh-torus networks, both with 100 nodes and a load per node of 0.5 Erlang. As can 
be seen, the blocking probabilities in both cases are close. The corresponding node 
degree gain for 12 wavelengths per link is depicted in Fig. 5. From this figure, we 
may observe that the node degree gain slightly increases as the converter density 
increases. Besides, the node degree gain is very small. 

We have further increased the number of nodes to 1600. Fig. 6 shows the blocking 
probability versus load per node for chordal ring and mesh-torus networks, both with 
1600 nodes and without wavelength interchange. In this case, as the load per node 
decreases from 0.1 Erlang to 0.001 Erlang, the variation of the node degree gain 
remains within one order of magnitude for the numbers of wavelengths per link that 
we 
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Fig. 3. Blocking gain due to an increase of the node degree from 3 (chordal ring networks with 
100 nodes) to 4 (mesh-torus networks with 100 nodes), for chord lengths of NIA and -Jn -i-3 

and without wavelength interchange. w=N!A, F=A\ — - — ; w=N!A, F=8: ; w=N!A, F=12: 

; w=N/A, F=16: - - - w=^^N +3, F=4: ; w=yjN +3, F=8: ; w=^/iV-l-3, 

F=12: ; w= ^/iv -1-3, F=12: . 




Converter density 

Fig. 4. Blocking probabilities for chordal rings with w= -J~N -1-3 and mesh-toms networks, both 

with 100 nodes and a load per node of 0.5 Erlang. Chordal ring with F=4: ; Chordal 

ring with F=8: ; Chordal ring with F=12: ; Chordal ring with F=16: - - - Mesh- 

torus with F=4: ; Mesh-toms with F=8: ; Mesh-torus with F=12: ; Mesh-torus 

with F=16: . 
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Converter density 

Fig. 5. Blocking gain due to an increase of the node degree from 3 to 4, as a function of the 
converter density, for chordal ring networks with w= ^J~N +3 and mesh-toms networks, both 
with 100 nodes, 12 wavelengths per link and a load per node of 0.5 Erlang. 




Load per node [Erlang] 



Fig. 6. Blocking probabilities for chordal ring networks with w= ^J~N -1-3 and mesh-torus 
networks, both with 1600 nodes and without wavelength interchange. Chordal ring with F=4: 

; Chordal ring with F=8: ■ ■ ■ ■ ; Chordal ring with F=12: - - - Mesh-toms with F=4: 

; Mesh-toms with F=8: ; Mesh-toms with F=12: . 
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Load per node [Erlang] 



Fig. 7. Blocking gain due to an increase of the node degree from 3 to 4, as a function of the 
load per node, for chordal ring networks with w= ^J~N -1-3 and mesh-torus networks, both with 
1600 nodes and 12 wavelengths per link without interchange. 




Converter density 

Fig. 8. Blocking probability versus converter density for chordal ring networks with w= ^fN -1-3 
and mesh-torus networks, both with 1600 nodes and a load per node of 0.1 Erlang. Chordal 

ring with F=4: ; Chordal ring with F=8: ■ ■ ■ ■ ; Chordal ring with F=12: - - - Mesh- 

torus with F=4: ; Mesh-torus with F=8: ; Mesh-torus with F=12: . 
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Converter density 

Fig. 9. Blocking gain due to an increase of the node degree from 3 to 4, as a function of the 
converter density, for chordal ring networks with w= ^J~N +3 and mesh-torus networks, both 
with 1600 nodes and a load per node of 0.1 Erlang. F=4: ; F=8: ; F=12: . 

have considered: 4, 8 and 12. See also Fig. 7 for the case of 12 wavelengths, where 
the difference between both curves is higher. Fig. 8 shows the influence of converter 

density on the blocking probability for mesh-torus and chordal rings with w= Viv 4-3, 
both with 1600 nodes and a load per node of 0.1 Erlang. In both networks, 
wavelength interchange is more helpful as the number of wavelengths per link 
increases, but, even for the case of 12 wavelengths per link, the node degree gain 
increases from 6.1 to only 127.9, as the converter density increases from 0 to 1 (see 
Fig. 9). Again, the node degree gain is very small and increases slightly as the 
converter density increases. 

These very small node degree gains, observed as we increase the node degree from 
3 (chordal rings with w= J~N 4-3) to 4 (mesh-torus networks), may be explained by 
the dependence of pi (hop length distribution) with I in both cases. The average hop- 
length is of the order of m, 0(m), in both cases (see equations 2 and 4). 

If a small blocking performance degradation is allowed, the choice of chordal ring 

networks with w= Viv 4-3, instead of mesh-torus networks, leads to a reduction in the 
number of network links, and hence in the total cable length, since the number of 
links in a A-node chordal ring is 3N, and the number of links in a A^-node mesh-torus 
is 4N. However, there are some restrictions that limit the practical implementation of 

chordal rings with w=4n 4-3 (as well as mesh-torus), when compared with other 
chord lengths. In fact, the smallest network diameter was obtained with w= 4-3 for 
a Af-node chordal ring, where A is a square (N=m^) and N > 64. The restriction 
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associated with the square is not imposed to other chord lengths such as w=NIA or 
w=3. 



4 Conclusions 

We have presented a performance comparison of wavelength routing optical 
networks with chordal ring and mesh-torus topologies. This comparison revealed an 
important feature of chordal rings: the performance of chordal ring networks, with 

chord length of -\[n h- 3, is similar to the performance of mesh-torus networks. For 
this comparison, networks with 100 and 1600 nodes have been considered. 
Concerning the influence of wavelength interchange on the small node degree gains, 
it was shown that the node degree gain is very small and increases slightly as the 
converter density increases. 

Since the performance of mesh-torus and chordal rings, with chord length of 
■\[n h-3, is similar, the choice of a chordal ring with a chord length of w=J~N h-3, 
instead of a mesh-torus network, leads to a reduction in the number of network links, 
and hence in the total cable length, since the number of links in a Wnode chordal ring 
is 3N, and the number of links in a Wnode mesh-torus is 4N. 
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Abstract. The differentiated services (DS) architecture provides Qual- 
ity of Service(QoS) assurance to different classes of service (CoSs). Our 
previous research results show that both intradomain and interdomain 
best-effort traffic can have adverse impact on the interdomain TCP as- 
sured service traffic. With the Measurement-based connection-oriented 
assured service model we developed in our previous research, we are able 
to provide end-to-end TCP throughput assurance for each CoS. How- 
ever, for each TCP session within a CoS, the throughput may not be 
able to be guaranteed. In this paper, we propose modified marking and 
dropping policy based on our previous research results. The simulations 
show that with these techniques, the end-to-end throughput for each in- 
dividual TCP flow can be significantly improved. It also maintains the 
high scalability of the DS architecture without requiring the core router 
to keep flow per-flow state information. 



1 Introduction 

As the Internet evolves into a global commercial infrastructure, there is a grow- 
ing need to support quality of service (QoS) to applications. Recently, a radical 
approach, known as differentiated services (DS) |1I5| . has attracted much atten- 
tion. The DS model is based on the assumption that resources are abundant in 
the core and bottlenecks occur only at the border nodes between domains. While 
offering multiple CoSs, the DS model ensures scalability by keeping a stateless 
core and adhering to the IP (i.e. Internetworking Protocol) hop-by-hop forward- 
ing paradigm. However, a key problem for this model is the conflict between 
maximizing resource utilization and achieving a high service assurance. In order 
to provide high service assurance, enough resources need to be provisioned to all 
the possible paths in the direction from a source to a destination. 

In our previous papers, 0, we developed a Measurement based Connection- 
Oriented Assured Service Model (MCOAS). With MCOAS, a high end-to-end 
service assurance can be achieved for the TCP traffic of assured service (AS), 
while a reasonably high throughput for best-effort traffic is also maintained. 
However, there is an issue requiring further study. In the MCOAS model, the 
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fairness problem exists between each TCP flow within the aggregated flow. In 
other words, in the MCOAS model, the end-to-end throughput assurance is 
provided aggregately to the AS traffic. I. Yeom and A. Reddy observed the 
unfair issue, and improved the model to achieve better service assurance and 
fairness I3HE]- However, these works assumes the source will always send out 
data as fast as they can, or requires the edge devices be able to notify the sender 
to slow down. In order to solve these problems, in this paper, we propose new 
marking, measuring and dropping algorithms which are able to solve the fairness 
problem based on the MCOAS model. This paper heavily relies on our previous 
research work|^. 

The rest of the paper is organized as follows. Section 2 presents a background 
introduction on MCOAS model and our research motivation. Section 3 describes 
the proposed schemes including marking and dropping algorithms. Section 4 
gives the experiment results on the proposed scheme. Finally, Section 5 concludes 
the paper and presents future research directions. 



2 Background 

The original idea for designing AS for TCP applications was proposed by Clark 
and Fang pj. Each AS session using TCP receives a guaranteed minimum band- 
width called target rate. Traffic is policed at every Internet service provider 
(ISP) domain edge node. At the edge node, conformant packets are marked as 
IN-proflle and non-conformant packets are marked as OUT-proflle. Both IN and 
OUT packets are injected into the core of the network, and the OUT packets 
are treated the same way as the best-effort packets. In each core router, a sin- 
gle flrst-in- first-out (FIFO) queue is used for both AS and best-effort traffic. 
A 2-level RED (i.e. random early detection) fTTini packet dropping algorithm, 
called RIO (RED in-and-out) p], is run based on traffic type. At each and every 
domain boundary, traffic is policed locally, and packets are subject to remarking 
before being injected into another domain, based on local congestion situation. 
However, since there is no end-to-end resource provisioning, an end-to-end ser- 
vice assurance is not guaranteed. Several other works on the improvement of 
this model in an attempt to achieve better service assurance and fairness were 
proposed [7pSf2j . all based on a connectionless forwarding paradigm. 

The above studies did not consider end-to-end performance of the AS TCP 
sessions, in the presence of possible the cross best-effort traffic with small round- 
trip time (RTT). The cross traffic could occur within a domain or at a domain 
boundary. A question is whether local control at each domain boundary can 
guarantee end-to-end performance for AS flows that cross multiple domains. To 
answer this question, we did simulation tests on the approach proposed in |0|, 
using NS-2 from LBNL (Lawrence Berkerley National Laboratory). 

The network setup is shown in Fig. Q It is composed of three domains. The 
link bandwidth between routers are 33 Mbps each and the buffer size is 50 
packets for each output port of a router. Each hosts in domain 1 has an AS TCP 
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session with one of the hosts in domain 3. The aggregated target rate of these 
10 hosts for AS is 30 Mbps. 




Fig. 1. Network setup for experiment 1 



Table 1. Aggregated Throughputs/Goodputs of Assured Services for experiments 1 
and 2 



Flow 


RTT(ms) 


Target 


Testl 


Test 2 


0 


20 


5Mbps 


5.2/5.1 


3.6/3.5 


1 


20 


1Mbps 


3.0/2.8 


1.0/0.9 


2 


40 


5Mbps 


4.1/3.9 


2.8/2.7 


3 


40 


1Mbps 


2.0/1.9 


1.0/0.9 


4 


50 


5Mbps 


4.0/4.0 


3.0/3.0 


5 


50 


1Mbps 


2. 1/1.9 


0.8/0.8 


6 


70 


5Mbps 


3.7/3.6 


2.6/2.6 


7 


70 


1Mbps 


1.6/1.5 


0.8/0.8 


8 


100 


5Mbps 


3.5/3.4 


2.6/2.6 


9 


100 


1Mbps 


1.2/1. 2 


0.7/0.6 


Total 




30Mbps 


30.3/29.4 


18.9/18.3 



The parameters for RIO are set at {miriin,maxin, Pin) = (40,70,0.02) for 
IN packets and {minout, fnaXout,Pout) = (10, 30, 0.2) for OUT packets. For more 
details on RIO, please refer to |S|. 

Two experiments were performed. In the first experiment, we assumed that 
there is no cross traffic. In the second experiment, we added 50 best-effort TCP 
flows between nodes R2 and R3 with a RTT of 10 ms each, representing the 
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intradomain cross traffic (we can also interpret it as the interdomain cross traffic 
coming into domain 2 via R2 and going out to other domains via R3). 

Table 1. summarizes the results for the two experiments. Here we focus on the 
aggregated throughput and goodput. As one can see, RIO achieves rather high 
aggregated service assurance. However, the situation becomes quite different in 
the presence of the cross best-effort traffic. Most of the AS sessions fall short of 
their target rates. Even worse, the achieved aggregated throughput/goodput are 
only about two third of the target value. 

In our previous paper, we developed a MCOAS model which can provide 
end-to-end TCP throughput assurance. The MCOAS model is a connection ori- 
ented service model composed of a series of measures which includes: (a) a path 
pinning mechanism for AS allowing aggregated bandwidth reservation for AS at 
each intermediate router in the forwarding path; (b) a packet marking strategy; 
(c) a dropping policy; (d) an adaptive dropping-threshold calculation method 
for queue management based on aggregated reserved bandwidth and real-time 
traffic measurement. 




Fig. 2. Aggregated throughputs/goodputs for MCOAS and RIO 



To show the performance of MCOAS, we consider a network setup with one 
more domain in the data path as shown in Fig.O In this experiment. There are 
10 AS TCP sessions between the hosts in domain 1 and domain 4. Their target 
rates and RTTs are listed in Table m The aggregated target rate is 30 Mbps. 
The link capacities between R1 and R2, R2 and R3, R3 and R4, and R5 and R6 
are all 33 Mbps. The link capacity between R4 and R5 is 50 Mbps. There are 30 
cross best-effort flows from R2 to R3 in domain 2 and 30 cross best-effort flows 
from R5 to R6 in domain 3. The simulation is performed for MCOAS with n = 2 
and RIO. Both throughput and goodput are measured. This time we want to 
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test the performance of each session and the results are listed in Table |2l with a 
slash separating the throughput from the goodput. 



Table 2. Individual throughput/goodput for MCOAS and RIO 



Flow 


RTT(ms) 


Target 


RIO 


MCOAS 


0 


20 


5 


3.4/3.3 


4.7/4.S 


1 


20 


1 


0.9/0.8 


2.7/2.6 


2 


40 


5 


3.2/3.1 


4.4/4. 3 


3 


40 


1 


0.6/0.6 


2.0/1.9 


4 


50 


5 


2.9/2.8 


4.0/4.0 


5 


50 


1 


0.8/0.7 


2.0/1.8 


6 


70 


5 


2. 4/2.4 


3.8/3.7 


7 


70 


1 


0.8/0.7 


1.5/1.4 


8 


100 


5 


2.8/2.8 


3.0/3.0 


9 


100 


1 


0.5/0.5 


1.3/1.2 


Total 




30 


18.1/17.6 


29.2/28.4 



Table 3. Aggregated throughputs/goodputs of best-effort traffic for MCOAS and RIO, 
and the total throughputs/goodputs for MCOAS in domain 2 and domain 3 





In Dmain 2 


In Domain 3 


RIO 


14.40/12.38 


29.25/27.46 


MCOAS 


1.74 /1.51 


19.30/17.48 


Total Rate with RIO 


32.53/30.00 


47.38/45.08 


Total Rate with MCOAS 


30.95/29.93 


48.51/45.90 



As one can see, MCOAS outperforms RIO for all the AS sessions and again, it 
offers superior performance to RIO in terms of aggregated throughput guarantee. 

To see the performance of the best-effort traffic, Table 0 lists the aggregated 
throughput/goodput for the cross best-effort traffic in both domains. As ex- 
pected, MCOAS gracefully suppresses the best-effort traffic in both domain 2 
and domain 3, offering rather high goodputs at about 1.5 Mbps and 17.5 Mbps 
in the respective domains. Hence, MCOAS can locally suppress cross traffic, re- 
sulting in a near-optimal global resource utilization. For the detailed information 
about the MCOAS model, please refer to |2|. 

However, an open issue in the MCOAS model is to how to solve the fairness 
issue between the individual flows. From Table. Q we can see that the flow with 
lower target rate achieves higher throughput than it requires while the flow with 
higher target rate achieves less throughput then it requires. Since the ultimate 
target is to provide per-flow QoS assurance to each users, it is necessary for us 
to solve this fairness problem. 
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3 Proposed Scheme 

In this section, we propose our solutions based on our MCOAS model to solve 
the fairness problems. 



3.1 Marking Policy 

In the MCOAS, the packets from AS are marked as AS or EX packets. The EX 
packets from the all the AS flows are treated in the same way. However, drop 
EX packets from different different TCP session in fact has different impact on 
the throughput. 

Figure. 0 illustrates the different impact. In the figure, “x” representees a 
packets drop. Assume that both TCP sessions receive a packets drop at the 
same time, it is clear that the average throughput of the session with higher 
target rate will not meet the target rate while the session with lower target 
rate will meet the rate. It is clear drop the EX packet from the session with 
high target rate will have more significant impact. Notice that in this example, 
the EX packets in the queue are primarily from the low target rate session, the 
queue is primarily occupied by the EX packets from the low target rate session 
which increases the probability of dropping a packet from the high target rate 
session. This causes the fairness issue we have observed. In order to solve this 
problem, we propose a modified marking policy. 

In RIO and MCOAS presented in the previous sections, the sending rate is 
measured with the Time Sliding Window (TSW) algorithm which is described 
with the following equation 

Initially: 

Win_ Length=a constant; 
i?(^^=connection’target rate, Rtarget't 
T_ front=0 

Upon each packet arrival, TSW updates its state variables as follows: 

Bytes_ in_ TSW=i?tsuj*Win_ length; 

New_ bytes = Bytes. in_ TSW + pkt_ size; 

Rtsw — New. bytes / (now - T. front + Win. length); 

T front = now 

Whereas, now is the time of current packet arrival, and pkt. size is the 
packet size of the arriving packet. 

For the details, please refer to 0. The TSW algorithm only measures the 
average with in the fixed-length time window. Since the window length is usually 
chosen to be short, the sending rate measured by the TSW algorithm can be 
considered at the real time sending speed. 
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Sending Sending 

Rate Rate 





High Target Rate Low Target Rate 

Fig. 3. MCOAS performance with on-off AS sessions 



In the new marking policy, two sending rates are measured: 1) we measure 
the sending rate, Rtsw, with Time TSW algorithm as usual. 2) we also measure 
the average session sending rate, Rsession, of each TCP session which can be 
calculated with the following equation 



R 



session 



sender 



^duration 



( 1 ) 



Whereas, Ngender the total bits are sent by the sender during the current 
session; and Tduration is the duration of the current session. 

Since MOCAS model is a connection-oriented model and the establishment 
and teardown process are required, the Rsession can be easily measured. Ac- 
cordingly, when the Rtsw < Rtarget and Rsession < Rtarget, the packet will be 
marking as AS packet. When Rtsw > Rtarget and Rsession < Rtarget, we mark 
the packet as MA packet. When the Rsession > Rtarget, we mark the packet as 
EX packet. 



3.2 Dropping Algorithm 

The new dropping policy is described as follows 
If {Packet Type is AS") 

Process the packet in the same way as RIO processes an IN packet 
Else if {packet Type is BE and queue length of the best-effort packets 
>Kte) 

Drop the BE packet 
Else if Packet Type is MA 

Process the packet in the same way as RIO processes an IN packet 
with different thresholds 

Else 

Treat EX and BE packets the same way as OUT in RIO 
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The only difference between this policy and MCOAS is that in this policy, 
we use a three parameter sets to further discriminates between the AS, MA and 
EX packets. 



4 Simulation Results and Analysis 




Fig. 4. MCOAS performance with on-off AS sessions 



To examine the performance of new proposed schemes, we run the simulation 
with the network setup in Fig.|2|again. This time we want to test the performance 
of each session and the results are listed in Table 01 with a slash separating the 
throughput from the goodput. 

The parameters for dropping policy are set at (minin,rnaxin, Pin) = 
(40, 70, 0.02) for IN packets, {miuma, ^naxma, Pma) = (30, 50, 0.08) for MA pack- 
ets and {minouti "max out, Pout) = (10,30,0.2) for BE and EX packets. 

From Table El it is clear that with the modifications, the performance of each 
TCP session has been significantly improved. In order to show the improvement, 
we plot the throughput deviation for both policies in Fig. 0 The solid line 
represents the end-to-end throughput to the target target ratio. And the dotted 
line shows the same ratio of new policy. It is clear that with the new policy, the 
deviation of each TCP session is significantly reduced. 

With the new policies, the MCOAS model maintains high end-to-end aggre- 
gated throughput. However, when we look at the throughput achieved by each 
TCP session, it is clear that with the new policies, the throughput achieved by 
each session is close to the ideal throughput. However, the MCOAS model still 
can not provide the ideal throughput guarantee to each TCP session. 
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Table 4. Individual throughput/goodput for the proposed scheme and MCOAS 
(UINT:Mbps) 



Flow 


RTT(ms) 


Target 


New Policy 


MCOAS 


0 


20 


5 


4.9/4.8 


4.7/4.5 


1 


20 


1 


1. 7/1.6 


2.7/2.6 


2 


40 


5 


4.7/4.5 


4.4/4.3 


3 


40 


1 


1.5/1.4 


2.0/1.9 


4 


50 


5 


4.4/4.3 


4.0/4.0 


5 


50 


1 


1. 2/1.1 


2.0/1.8 


6 


70 


5 


4.2/4. 1 


3.S/3.7 


7 


70 


1 


1.2/1.0 


1.5/1.4 


8 


100 


5 


4.1/4.0 


3.0/3.0 


9 


100 


1 


1.2/1.0 


1.3/1.2 


Total 




30 


29.1/27.8 


29.2/28.4 



5 Conclusions and Future Work 

In this paper, new marking and dropping policy are proposed. We are able to 
show that based on the scheme proposed scheme, the fairness problem between 
individual flows within the aggregated flow can be solved. The proposed scheme 
will not require the core routers to keep the per-flow information so that it is 
highly scalable. When work with the MCOAS model, we are able to provide 
end-to-end throughput assurance to individual without requiring the core router 
to keep per-flow information. 

In is worth mentioning that the MCOAS model is a connection-oriented 
model so it can be easily used on top of a MPLS enabled network 0 . The future 
work focus on how to combine these two technologies together. 
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Abstract. Integrated Services over Local Area Networks are treated. 
We describe the main functionalities of LAN devices that allow the real- 
ization of the Integrated Services model in such networks. Based on an 
admission control algorithm capable of providing queueing delay guar- 
antees across a LAN, we evaluate the share of the network capacity that 
can be used by flows requiring QoS guarantees. We find that for LANs 
of realistic size, only a small percentage of the network capacity can be 
made available in order not to violate the QoS guarantees. 



1 Introduction 

The goal of the Integrated Services (IntServ) model is to provide a well-defined 
QoS to data flows along the entire transmission path between the senders and 
the receivers of the flows. Local Area Networks (LANs) generally constitute the 
to last hops to terminal devices that act as senders and receivers of data flows. 
Traditional LANs, often composed of different LAN technologies, do not allow 
for service differentiation due to the lack of both QoS supporting mechanisms in 
LAN technologies like Ethernet and QoS signaling for data flows across differ- 
ent technologies. As real-time applications will gain importance in future com- 
munications systems, there has been much interest for the provision of service 
differentiation in LANs. This has been made possible by evolving link layer tech- 
nologies and the standardization of capabilities needed for service differentiation 
by the IEEE for the most important LAN technologies, defined within the project 
802, including the (switched) Ethernet/IEEE 802.3 and (shared/switched) To- 
ken Ring/IEEE 802.5 networks. Shared Ethernet networks are not capable of 
guaranteeing transmission delay bounds and thus are not suitable for realizing 
the IntServ model. 

Service differentiation is achieved by assigning priorities to packets and using 
priority scheduling at forwarding devices. The use of priorities for service differ- 
entiation has as consequence that different flows using the same priority cannot 
be distinguished inside a LAN and thus cannot be isolated one from another 
at forwarding devices. This so-called aggregate seheduling is very different from 
the treatment of IntServ flows at routers. In this article we study the influence 
of aggregate scheduling on the performance of such networks. The evaluations 
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are based on an admission control algorithm that limits the high-priority traffic 
such that delay bounds can be guaranteed. The algorithm is based on recently 
derived delay bound results for networks with aggregate scheduling P . 

2 IEEE 802 LAN Model 

Let us consider a layer 2 domain which we define as a closed network region in 
which frames are forwarded based on layer 2 addressing, thus without employing 
layer 3 forwarding functionality. A layer 2 domain consist of segments that are 
separated by bridges or switehes. A segment is a physical medium to which one, 
two, or more senders are connected. Examples of segments are: 

— a Token Ring, 

— a half duplex link between two stations, 

~ one direction of a full duplex link. 

Bridges are forwarding devices that operate at layer 2 (L2) of the OSI ref- 
erence model. As such, they are independent of the higher layer protocols used. 
They accept incoming frames, decide based on the information contained in the 
frames to which output ports the frames have to be forwarded, and transmit 
them over the selected ports. The forwarding decisions are based on physical 
layer addresses, as opposed to logical addresses used in layer 3 (L3) devices 
(routers). Due to their simple operation mode, bridges can forward frames at 
high speed. Switches are similar to bridges as they also interconnect LAN seg- 
ments, forward frames based on physical addresses, and filter traffic. They are 
different, however, in that switches are high-speed devices that make forwarding 
decisions in hardware whereas bridges operate in software. Therefore, switches 
can serve more ports than bridges. In the following we will used the term bridge 
to refer to both types of devices. 

Since bridges operate at the data link layer, they cannot identify individual 
data flows, which are distinguished by elements of the layer 3 packet headers. Per- 
flow queueing, policing, and reshaping, which are required for Controlled Load 
(CL) and Guaranteed Service (GS), thus cannot be implemented. Bridges, how- 
ever, can provide capabilities allowing an approximation of these services that 
may be sufficient for most practical purposes. The characteristics of IEEE 802 
bridges are standardized in order to assure the interoperability of different LAN 
technologies. The latest revision of this standard has introduced new capabilities 
of bridges that provide the base for service differentiation in IEEE 802 LANs. 

Such bridges have up to 8 output queues per port. The default scheduling 
algorithm uses static priorities assigned to the output queues. The specifica- 
tion for LAN bridges allows user priorities to be assigned to frames. A frame 
is mapped to an output queue depending on its user priority. User priorities, 
multiple output queues, and appropriate scheduling algorithms together make 
it possible to define different traffic classes, e.g., for best effort, video, and voice 
traffic. A traffic class is hence an aggregation of data flows which are given a 
similar service within a IEEE 802 network. The isolation of flows required for 
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Table 1. Service mapping of user priorities 



User 

priority 


Service 


7 


Network control 


6 


Delay sensitive appl., 10 ms bound 


5 


Delay sensitive appl., 100 ms bound 


4 


Delay sensitive appl., no bound 


3 


Currently unused 


2 


Currently unused 


1 


Less than Best Effort 


0 


Default service (Best Effort) 



the IntServ model, however, is only coarse since individual flows of the same 
traffic class cannot be distinguished. If the high-priority traffic uses only a small 
fraction of the available transmission capacity, this may be sufficient to achieve 
a good approximation of the IntServ model. In this case, a very high percentage 
of frames is delayed at a bridge by at most the transmission time of a maximum 
sized frame. The percentage of frames experiencing longer queueing delays may 
be negligible for practical purposes. Admission control can limit the amount of 
time-critical traffic offered to bridges and thus influences the fraction of frames 
suffering longer queueing delays. 

The semantic of the user priorities is defined in 0 as shown in Tab. [D 
Delay sensitive applications that need quantifiable queueing delay bounds can 
use the user priorities 5 and 6, depending on their requirements. Following the 
IEEE 802. ID specification we consider the delay bound values given in Tab. ^ 
as the total maximum queueing time of a frame across an entire L2 domain. 

2.1 Topologies of IEEE 802 LANs 

L2 domains may have arbitrary topologies in which loops and multiple paths 
between edge devices may exist. Two different approaches exist to determine a 
path that a frame will follow across a 802 LAN : 

— the IEEE 802 spanning tree protocol, 

— source routing. 

In the specification for IEEE 802 LAN bridges, a spanning tree protocol is 
defined that overlays the actual LAN topology with a spanning tree that contains 
exactly one path between each pair of LAN segments. Ports of bridges that are 
not part of the spanning tree are disabled. This establishes a virtual, connected, 
loop free topology, called the active topology. 

If source routing is used, L3 devices that send frames onto L2 segments 
have to include into the frame header the exact path that the frame has to 
follow across the LAN to the destination L3 device. To learn the route to a 
given destination, a L3 device can send out route explorer frames that traverse 
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the network and reach the destination which sends back a response including 
the path taken by the explorer frame. Explorer frames can be forwarded by 
bridges using broadcasting (All Routes Explorer frames), allowing the discovery 
of all routes to the destination devices, or following a configured spanning tree 
{Spanning Tree Explorer frames) of the network, resulting in a unique route to 
the destination. 

Ethernet devices use the spanning tree protocol while Token Ring and FDDI 
devices mainly implement source routing and use the spanning tree protocol only 
to a smaller extend. 

3 Admission Control 

Admission control must be employed offer the required QoS to data flows. Re- 
cently, queueing delay bound for networks with aggregate scheduling have been 
derived in the context of Differentiated Services networks offering Expedited 
Forwarding Service P]. It can be shown that these queueing delay bounds are 
also applicable to L2 domains with priority queueing and multiple priorities. 

In principle, these queueing delay bounds could be used to decide whether a 
new flow requiring a guaranteed maximum queueing delay can be accepted in a 
L2 domain or not. A flow would have to be refused if the delay bounds would 
exceed the allowed value. However, the service offered by an interface to the flows 
of a given user priority p depends on the traffic of all flows with priorities higher 
than p. Therefore, the admission of a high priority flow at an interface influences 
the service offered to lower priority flows, and consequently also the queueing 
delay bound that can be guaranteed to these flows. This renders necessary a 
complex admission control model which has to consider the possible changes of 
the queueing delay bounds provided to low priority flows in order to decide if a 
high priority flow can be accepted. 

To reduce the complexity of the admission control decision we have devel- 
oped an admission control algorithm that limits the traffic in each priority class 
such that the defined queueing delay bounds can be guaranteed. This admission 
control is based on the following assumptions: 

— Each flows / subject to admission control is conform to a token bucket 
traffic controller with token rate rj and bucket depth bf upon entry to the 
L2 domain. This can be achieved by shaping the flows to their IntServ traffic 
specification. 

— For each priority p there exist a constant Tp, such that for all interfaces I of 
the L2 domain 

where Si^p is the set of all flows with priority p that traverse interface 1. 

The parameters Tp represent bounds on the sum of the bucket depths of the 
flows using a user priority p with respect to the sum of their mean rates. Often, 
for a given type of flows (e.g., voice or video flows) there is a linear relationship 



382 J. Ehrensberger 



between their bucket depth and their mean rate. However, Tp also depends on the 
mix of flow types using a given user priority p. Therefore, we suppose that the 
actual values of the parameters Tp will in most cases be determined empirically 
by the network administrator using measurements and used as a configuration 
parameter of the admission control algorithm. Under these assumptions, the 
maximum rate that can be admitted to each priority class with queueing delay 
bounds can directly be computed. The admission control algorithm therefore 
becomes very simple. It suffices to assure that the sum of the mean rate of all 
admitted flows is at most equal to the maximum admissible rate. 

It turns out that the maximum admissible rate for a priority class at an inter- 
face depends on the ‘distance’ of the interface from the edge of the L2 domain. 
Formally, we define the eccentricity of an interface I as the maximum number of 
interfaces that any loop-free path between any interface of the L2 domain and 
the interface I, exclusive, may traverse0 It follows from this definitions that the 
eccentricity of an ingress interface I to the L2 domain is 0, since there is no path 
to I from any interface different from 1. We define the diameter of a network as 
the maximum number of interfaces that any loop-free path from an interface I 
to an egress node my traverse. 



4 Comparison of Network Configurations 

The maximal admissible rate for different traffic classes limits the transmission 
rate that can be made available at an interface for data flows requiring queueing 
delay bound guarantees. The admissible rate depends on factors like the network 
size and topology or the LAN technologies used. This must be considered in the 
planning of a network in order to correctly dimension the link capacities for the 
estimated traffic load. 



4.1 Full Duplex Switched Ethernet Domains 

As a first network configuration we consider a L2 domain employing switched 
Ethernet in the entire domain, as shown in Fig. [H Depending on the capacity 
requirements, Ethernet (10 Mbits/s), Fast-Ethernet (100 Mbits/s) or Gigabit- 
Ethernet (1 Gbits/s) is installed in the different parts of the network. Although 
the end-systems have knowledge about individual flows, they are assumed to use 
aggregate scheduling on the interfaces transmitting into the L2 domain. Voice 
flows with an estimated value of t = 0.025 are mapped to the 10 ms delay bound 
service whereas video flows with r = 0.09 use the 100 ms delay bound service. 
The influence of network control traffic is neglected. 

The delay bounds of the service classes are for the crossing of the entire L2 
domain. The delay bound that has to be guaranteed at each interface is thus 
obtained by dividing the end-to-end delay bound by the diameter of the L2 
domain, which has a value of 6 interfaces in this case. 

^ This definition is compatible with the definition of eccentricity in graph theory. 
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Ethernet switch 




End-system 



Router 



Ethernet 10 Mbits/s 

Fast-Ethernet 100 Mbits/s 

Gigabit-Ethernet 1 Gbits/s 



Fig. 1. Switched Ethernet L2 domain 




Fig. 2. Service class utilization factor of a switched Ethernet L2 domain 



We want to evaluate the maximum share of the transmission capacity of each 
interface that can be used for the service classes with delay bounds. Therefore, 
we define a service class utilization factor as the ratio of the maximum admissible 
rate i? of a service class and the interface transmission capacity C. 

Fig. |2| shows the service class utilization factors of the considered configura- 
tion for different interface eccentricities. The values for Gigabit-Ethernet are not 
shown since they are very close to those of Fast-Ethernet. The figure has to be 
interpreted as follows. Consider the path from host Hi to host i? 2 . The interface 
Hi ^ A has an eccentricity of 0 and uses Ethernet 10 Mbits/s, therefore a max- 
imum of 1.8% of the interface capacity can be used for data flows requiring a 
10 ms queueing delay bound. For the 100 ms queueing delay bound service class, 
at maximum 16.3% of the interface capacity can be used. The interface from A 
to B has eccentricity 1 and uses Fast-Ethernet, hence at maximum 5.8% and 
13.2% of the interface capacity can be offered to the 10 ms and 100 ms queueing 
delay bound service classes, respectively. It can be seen that the service class 
utilization factors decrease with the eccentricity of an interface. In the extreme 
case of the interface with eccentricity 5 from switch E to host H 2 only 1.3% and 
8.5% of the interface capacity are available for the two service classes. 
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(a) 10 ms queueing delay bound 



(b) 100 ms queueing delay bound 



Fig. 3. Service class utilization factors in Fast-Ethernet network of different sizes 



Influence of the Network Diameter Increasing the diameter of a L2 do- 
main decreases the service class utilization factors due to two effects. Firstly, the 
queueing delay bound at each individual interface decreases since more interfaces 
may be traversed by a flow and the end-to-end queueing delay bound across the 
entire L2 domain must not change. Secondly, interfaces may have higher ec- 
centricities what reduces the maximum admissible rate. Fig. |3(a)| and Fig. |3(b)| 
show the maximum service class utilization factors at Fast-Ethernet interfaces 
for different interface eccentricities and network diameters. It can be seen that 
especially for the 10 ms queueing delay bound service class, the diameter of the 
L2 domain strongly influences the utilization factor. For example, at an interface 
of eccentricity 3, the utilization factor for the 10 ms service class changes from 
7.3% to 3.9% as the diameter doubles from 4 to 8. For the 100 ms service class, 
12.5% and 8.4% utilization can be achieved at an interface of eccentricity 3 for 
a network diameter of 4 and 8, respectively. 

4.2 Token Ring Networks 

The simplest configuration of a Token Ring LAN network consists of a single 
shared ring segment to which all devices of the LAN are connected, as shown in 
Fig. ^4(a)t A delay of N ■ THT^ax (including medium access delay and maximum 
frame time) may arise at an interface on a shared Token Ring segment before 
the highest priority queue can be served. Here, N is the number of stations 
sending high priority traffic and THT^ax is the maximal token holding time. 
In general, we can assume that all connected stations may send high priority 
traffic. The default value for THT^ax of 10 ms is too high to realize the 10 ms 
queueing delay bound service. To choose an appropriate value of THTmax, the 
number of connected stations and also the desired utilization factors for low delay 
bounds services should be taken into account, since these services are primarily 
influenced by the medium access delay. Fig. El show the utilization factors on a 
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(a) Simple shared (b) Switched network 

configuration 



Fig. 4. Token Ring configurations (16 Mbits/s) 



0.5 

I 0.4 

= 0.3 

« 0.2 
£ 

« 0.1 
0 

23456789 10 

Connected stations 



0.01s delay bound 
0.1s delay bound 



Fig. 5. Service class utilization factor of a single shared Token Ring segment 



single 16 Mbits/s Token Ring segment with THTmax = 1 nis and a varied number 
of connected stations. Since this L2 domain has a diameter of 1, high utilization 
factors can be achieved for a small number of connected stations. However, the 
utilization factor for the 10 ms queueing delay bound service decreases linearly 
with the number of connected stations, due to the increasing influence of the 
medium access delay. For 10 or more connected stations, this service cannot be 
realized anymore, since the sum of the medium access delay and the maximum 
frame time is greater than the required queueing delay bound. The utilization 
factor for the lower priority 100 ms queueing delay bound service increases with 
number of connected stations up to 10 stations, because of the reduction of the 
higher priority traffic. 

The Token Holding Time THT^ax limits the maximum frame size that can 
be used by station. For THTmax = 1 ms, the maximum frame size is about 
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Table 2. Utilization factors for source routed and spanning tree Token Ring configu- 
rations 



Interface 

Eccentricity 


Source Routing 


Spanning tree 


10 ms service 


100 ms service 


10 ms service 


100 ms service 


0 


6.0% 


23.3% 


4.0% 


19.1% 


1 


5.5% 


18.4% 


3.7% 


15.7% 


2 


5.0% 


15.2% 


3.4% 


13.3% 


3 


4.6% 


12.9% 


3.2% 


11.6% 


4 


— 


— 


3.0% 


10.2% 



2000 bytes. In order to avoid excessive protocol header overhead, its value cannot 
be arbitrarily small. Consequently, the number of stations that can be connected 
to a shared Token Ring segment is very limited. Separating the LAN into mul- 
tiple shared segments interconnected by bridges does not improve the situation 
since this leads to longer paths across the entire LAN and therefore smaller per- 
interface queueing delay bounds. We can therefore conclude that shared Token 
Ring segments are not well suited to implement services with low queueing de- 
lay bound guarantees. The solution to this is micro-segmentation, i.e., the use of 
half-duplex or full-duplex switched Token Ring segments with only two respec- 
tively one sender per segment. Nevertheless, a small maximum frame size should 
be chosen also in these cases to avoid long delays for high priority packets due to 
the transmission of big lower priority packets. In the following we will therefore 
again assume a maximum Token Holding Time of 1 ms, which also determines 
the maximum frame time. 

Token Ring networks by default employ general source routing which allows 
discovering the best paths between two nodes of the network. This has the ad- 
vantage, that the traffic load on the segments can be balanced. Fig. |4(b)| shows a 
switched Token Ring network consisting of four interconnected 16 Mbits/s full- 
duplex Token Ring switches. If general source routing is employed and assuming 
that the end-systems always choose the shortest path to transmit frames, at most 
four transmission interfaces may be traversed by any flow inside this L2 domain. 
The transmission interfaces at the end-systems have eccentricity 0. The inter- 
faces of the switches towards the end-systems have the maximum eccentricity of 
3. All other interfaces have a eccentricity of 2. 

Source routing Fig. |4(b)| may also be based on a spanning tree configura- 
tion. Therefore, switch A may be chosen the root of the spanning tree and the 
segment between C and D may be blocked. This configuration has a maximum 
path length of 5 and a maximum interface eccentricity of 4. Tab. 0compares the 
resulting service class utilization factors of the two configurations. Especially in 
the case of the 10 ms queueing delay bound service, the achievable utilization 
factors for equal interface eccentricities are substantially lower for the spanning 
tree configuration. This is due to the increased number of interfaces to traverse 
in this configuration. The main difference between the two configurations can be 
found at the interface A ^ D. This interface has eccentricity 2 in the source- 
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routed configuration and eccentricity 3 in the spanning tree configuration. This 
means that 5.0% of the transmission capacity are available for the 10 ms queue- 
ing delay bound service in the former configuration compared to only 3.2% in 
the later. For the 100 ms service, its utilization factor decreases from 15.2% to 
11.6% when using the spanning tree protocol. Also, the traffic experienced on 
this interface may be higher in this case, since all flows from A, B, or C towards 
D have to traverse this interface. In the source-routed configuration, only flows 
from A to D and about half of the flows from A to C and from B to D use 
this interface. We thus find that the spanning tree topology introduces a poten- 
tial bottleneck at the root of the tree that does not exist in the source-routing 
configuration. 

5 Conclusions 

In this article, the possibilities to realize the IntServ model in IEEE 802 LANs 
as well as the limiting factors have been presented. The first part gives a descrip- 
tion of the mechanisms defined by the IEEE that make it possible to support 
Integrated Services in such networks. We have seen that the major difference 
of layer 2 devices compared to layer 3 devices is the impossibility to identify 
individual data flows on the former. 

For the dimensioning of a LAN it must be known which traffic intensity may 
be admitted on the network interfaces for different service classes. Therefore, it 
is necessary to define the admission control algorithm used on the interfaces. An 
admission control algorithm based on the queueing delay bounds result would 
be very complex and thus is unlikely to be implemented on layer 2 devices. We 
have therefore developed a simplified admission control algorithm that allows us 
to explicitly compute the maximum mean rate of the aggregate flows of the dif- 
ferent service classes that can be accepted at an interface of a given transmission 
capacity. 

We use the developped model to compare LAN configurations with respect 
to the utilization factors that can be achieved at interfaces for different service 
classes. The main result is that for configurations of realistic size in general only 
a small fraction of the total interface capacity can be used for flows requiring 
queueing delay bounds. We show how the achievable utilization factors are in- 
fluenced by the network size, the LAN technologies used and the routing scheme 
of frames across the LAN. 
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Abstract. IP in the edge and ATM in the core are commonplace in 
today’s internetworks. The IETF has proposed a new Quality of Ser- 
vice (QoS) mechanism namely Differentiated Services (DiffServ) for IP 
networks. On the other hand, QoS is an inherent feature in ATM. It is 
imperative that IP and ATM QoS interoperate efficiently to provide an 
end-to-end service guarantee. DiffServ provides a class of service named 
Assured Forwarding (AF) that does not exactly correlate to any of the 
service categories offered by ATM. AF is targeted towards a range of 
applications, snch as real-time (rt) that do not require a constant bit 
rate service provided by Expedited Forwarding, and other non-real-time 
(nrt) applications that expect a service better than Best Effort. 

In this paper we propose the mapping of AF to the Variable Bit Rate 
(VBR) service category in ATM. VBR is suitable because it is available in 
the form of rt-VBR and nrt-VBR and could be translated appropriately 
based on the applications. The mapping is implemented and verified 
using the LBNL Network Simulator. The results of the experiments show 
that VBR is a better match for AF than any other service category in 
ATM. 

1 Introduction 

Recent advances in communications has facilitated computer networks to sup- 
port a wide spectrum of applications such as voice, multimedia, and traditional 
data. The introduction of voice and multimedia demands stringent service re- 
quirements such as bounded end-to-end delay, and delay variance in addition to 
a guaranteed traffic delivery mechanism. Quality of Service is envisioned as an 
essential component in building efficient networks. 

Several efforts in the area of QoS has resulted in approaches such as Inte- 
grated Services (IntServ) Q, MPLS traffic engineering 0, and Differentiated 
Services |B| in the IP domain and ATM Traffic Management Specification ^ in 
the ATM domain. IntServ offers an end-to-end service guarantee with Resource 
Reservation Protocol (RSVP) jS| as the signaling tool to reserve resources at 
every node in a path for every flow. The reservations are maintained in these 
nodes using a soft-state database imposing a very high demand for processing 
time and state maintenance storage in the backbone routers. MPLS Traffic En- 
gineering is an ongoing effort by the Internet Engineering Task Force (IETF). 

* This research was supported in part by NSF-ARI grant No. 9601602. 
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A different QoS approach, Differentiated Services (DiffServ), has been recently 
proposed by the IETF. DiffServ attempts to reduce the processing time by push- 
ing the functional elements required to implement QoS towards the edges of a 
network. QoS provisioning is based on aggregates of ffows that further reduces 
the state information maintained on individual routers. 

There are advantages to both IP and ATM technologies which have necessi- 
tated their co-existence in the network infrastructure. For instance, the ability 
of IP to adapt rapidly to changing networking conditions makes it appropriate 
for core routers. On the other hand, the scalability and cost/performance model 
of ATM switches are appropriate for backbone networks. The interoperation of 
the features of the two technologies to provide end-to-end QoS is crucial for the 
emergence of fast and reliable next-generation networks. 

One of the issues in integrating IP DiffServ with ATM QoS is translating the 
Assured Forwarding Per Hop Behavior (AF PHB) p] service requirements on to 
the ATM domain. The AF PHB is targeted towards a range of applications whose 
service requirements may vary from a level better than best-effort to applications 
that require a minimum guaranteed rate and delay characteristics. Additionally, 
AF introduces a concept of relativity that allows multiple AF aggregates of a 
class to be provisioned relative to one another. 

In this paper we propose to map the AF PHB to the VBR service category 
of ATM. VBR is attractive because of its ability to serve both real-time and 
non-real-time applications. Through simulation experiments, we show that AF 
relativity concept can be achieved by tuning the traffic parameters of different 
VBR connections mapped to a single class. 

2 Background 

The task of integrating IP Differentiated Services and ATM QoS is not straight- 
forward because of their inherent implementation differences. One of the major 
difficulties in merging the QoS architecture of the two technologies is that there 
is no service category in ATM that is similar to that of AF PHB. AF was de- 
veloped to support those applications that required a minimum guaranteed rate 
or end-to-end delay but did not need a channel dedicated to them such as in 
the Premium Services. Additionally, AF incorporates the concept of relativity 
whereby customers have the ability to prioritize different ffows emerging out 
of their domain. Although, AF has many attractive features, its deployment 
will be difficult if there were no efficient mechanisms to integrate it with other 
technologies. 

The problem of mapping AF to an appropriate ATM Service category has 
caught the attention of several researchers. Rabbat et al. have proposed two 
mapping mechanisms to ABR 0 and GRF 0. In both the schemes, the focus 
was on the effective throughput and AF relativity. Rogers et al. suggested a 
mapping of AF to VBR in their study of a new shaping algorithm for DiffServ. 
The scope of their study [Sj was the traffic conditioning mechanisms for Differen- 
tiated Services. The study of the performance characteristics, end-to-end delay. 
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and jitter in particular for mapping the real-time categories of AF to ATM is 
yet another interesting research topic. 

This paper develops a framework to map the AF PHB to the VBR service 
category in ATM. We manipulate the advantage in VBR to match all the types 
of applications targeted by AF. We further show that relativity can be achieved 
by tuning the traffic parameters used for different VBR services. The proposed 
architecture is verified using simulations using the LBNL (Lawrence Berkeley 
National Laboratory) Network Simulator {ns) 1101 . 

3 Proposed Mapping 

In designing the architecture, there are two issues to be considered: (i) the posi- 
tion of the mapper in an intermixed IP and ATM network, (ii) the QoS param- 
eters that must be mapped. 

The translation of DiffServ to ATM must happen at the IP boundary on a 
per-aggregate basis. Translation in the ATM domain may lead to complications 
due to the connection oriented nature of ATM. Each AF aggregate exiting a 
DiffServ domain would be mapped to a different VBR Virtual Circuit (VC) in 
an ATM domain. The real-time aggregates (for example, multimedia applica- 
tions) are mapped to rt-VBR (real time VBR) and non-real-time applications 
are mapped to nrt-VBR (non-real time VBR) In ATM, the traffic parame- 
ters corresponding to a service category is accepted at connection establishment 
time through the Connection Admission Control (CAC) procedure. 

In case of VBR, the parameters that constitute the service characteristics are 
the Peak Cell Rate (PCR), Sustainable Cell Rate (SCR), Maximum Burst Size 
(MBS in cells). Cell Delay Variation Tolerance (CDVT). The service parameters 
used in AF are Peak Information Rate (PIR), Committed Information Rate 
(CIR), Maximum Burst Size (MBS in packets) and Packet Delay Variation. 
Packet Delay Variation is an optional parameter and is mostly used when the 
application is real-time. The mapping from AF to VBR is done as follows: 

- PIR to PCR. 

- CIR to SCR. 

— PDV/cells per packet to CDVT 

— MBS*packetsize to MBS*cellsize 

It is important to tune SCR and CDVT for the real-time applications. In 
case of the AF relativity feature, the relative priority is usually assigned on the 
basis of the amount of bandwidth shared at a particular time in transmission. 
Therefore, the important parameters to consider are the SCR and the MBS. 
Other parameter to consider is the Cell Loss Priority (CLP). The CLP is par- 
ticularly useful after connection establishment. All the packets that conforms to 
SCR are marked as good (CLP=0) and the non-conforming ones are marked as 
bad (CLP=1). In case of DiffServ, there are three levels of drop precedence while 
CLP can be assigned only two values. To address this issue, packets arriving at 
a rate 
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ATM SWITCH 




Fig. 1. Architecture of ATM implementation on ns. 



where SscR = ±10% of SCR value are marked as good. The other option is 
to use the VBR3 0 category of ATM in which, cells are tagged and service 
degraded instead of cells being discarded during times of congestion. 

4 Simulation Setup and Experiments 

The LBNL Network Simulator {ns) with the DiffServ and ATM enhancements 
was used for our experiments. The Simulator has the facility to simulate IP net- 
works with the RSVP and DiffServ QoS mechanisms. We enhanced the Simulator 
to incorporate ATM functionality as well. 



4.1 ATM Simulator 

The ATM feature added included two main components, an ATM End Station 
and an ATM Switch. The ATM Switch consists of a Connection Manager, Traffic 
Conditioner and a Queue Scheduler. Figure Q] depicts the design of the ATM 
Simulator. 

The Connection Manager provides the functions to create, and delete ATM 
Permanent Virtual Circuits (PVC), and lookup the created PVC Database. The 
Traffic Conditioner performs the Connection Admission Control (CAC), and 
the Traffic Policing/Usage Parameter Control (UPC) and the Traffic Shaping 
functions. The Queue Scheduler schedules the traffic on the link. Queuing is 
done on a per-VC basis to provide fairness to all traffic especially during conges- 
tion. Two different scheduling mechanisms namely Priority and Weighted Round 
Robin (WRR) were considered. On the high level, priority is given on the basis of 
ATM QoS classes, i.e., 0 for CBR, 1 and 2 respectively for rt-VBR and nrt-VBR, 
3 for ABR, 4 for GFR and 6 for UBR. Between the various VC Queues of each 
category. Weighted Round Robin scheduling was used. The weights depend on 
the Peak Cell Rate (PCR) for CBR, Sustainable Cell Rate (SCR) for real time 
and non-real time VBR, Minimum Cell Rate (MCR) for ABR and GFR. For 
UBR the weights assigned to all VCs were same since the category is best-effort. 
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The ATM End Station provides the facility to the perform the segmenta- 
tion of IP packets to cells and reassemble cells to IP packets using the ATM 
Adaptation Layer 5 (AAL5) protocol. 



4.2 Topology and Experiments 

A network topology used by most researchers for the study of QoS is shown in 
Figure Hi]. There are 12 sources (SI . . . S12) and destinations (D1 . . . D12) on 
either side of a core network consisting of 6 Edge Routers (ERl . . . ER6) and 
two ATM switches (SWl and SW2) separated by a bottleneck link as shown. All 
the links from the sources to the Edge Routers and Edge Routers to destinations 
were 6 Mbps. The links from Edge Routers to Switches and vice versa were 25 
Mbps. The bottleneck link was 40 Mbps. The links were chosen such that the 
only bottleneck in the network was the core, i.e., the link between switches. A 
small propagation delay was also accounted for and it was a value of 5ms for 
all the links. The traffic sources used were CBR with UDP Transport Agent 
as real time generators and FTP with TCP Transport agents as non-real time 
generators. At the sources, each traffic ffow is assigned to one of the four different 
AF classes (Platinum, Gold, Silver, and Bronze). The relatively low transmission 
rates were chosen in order to keep the number of packets generated and hence 
the simulation times at a reasonable level. 




Fig. 2. Network topology for experiments. 



Two sets of experiments, (i) to test the performance characteristics (through- 
put, end-to-end delay, and jitter) of real-time sources with non-real-time sources, 
(ii) to test AF relativity were conducted. The experiments involved varying the 
source rates, the queue lengths and tuning the parameters, i.e., SCR, PCR and 
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the MBS. Traffic entering the Edge Routers (ERs) are scheduled with differenti- 
ation performed using DiffServ. At the edge the segmentation function is applied 
to convert packets to cells before scheduling them on the link. 

For the first set of experiments, three different experiments were conducted. 
In the first experiment, performance measurements of the network without any 
ATM, i.e., with two core DiffServ enabled IP routers were obtained. For Experi- 
ments 2 and 3, three PVCs were added one between each incoming and outgoing 
Edge Router. The aggregated traffic from 4 sources on each edge was transmit- 
ted on a single PVC. Each of the PVC was associated with a Traffic Descriptor 
that includes PCR, SCR, Maximum Burst Size (MBS) and Cell Delay Variation 
Tolerance (CDVT) for real-time and non-real-time VBR. The traffic parameters 
were assigned according to the mapping explained in Section 3. For the second 
experiment, we obtained results by mapping DiffServ to UBR service category. 
For the third experiment, we had DiffServ mapped to VBR. As explained in Sec- 
tion 3, we mapped traffic parameters of ERl to rt-VBR (since this received traffic 
from CBR sources) and traffic parameters of ER2 and ER3 to nrt-VBR but the 
parameter values were different for ER2 and ER3. The first set of experiments 
was as follows: 

Experiment 1: The source rates of CBR sources were varied keeping the Com- 
mitted Information Rate (CIR) value in Experiment 1 and the equivalent 
mapped SCR in Experiment 3 constant. The variance of throughput, delay 
and jitter in Experiments 1, 2, and 3 were studied. The variance of trans- 
mission rates of sources attached to TCP agents are not necessary since the 
TCP sources adjust their rates according to the feedback from the network. 
Experiment 2: A study of how delay and jitter in the mapping of DS to UBR 
vary with queue lengths in the network was conducted. 

For the second set of experiments, 6 PVCs were added, one between ERl, 
ER4 pair, one between ER3, ER6 pair and 4 between ER2, ER5 pair. In this ex- 
periment, the traffic parameters used on ERl and ER4 were pertaining to the EF 
service category of DiffServ and they were mapped to the CBR service category 
in ATM. The ER2, ER5 pair were configured to perform service differentiation 
using 4 different AF codepoints to yield AF relativity. The 4 PVCs between ER2 
and ER5 correspond to 4 different codepoints used on ER2 and ER5. All the 
4 PVCs were associated with nrt-VBR service category but with SCR equal to 
the Committed Information Rate (CIR) associated with each codepoint on the 
edge routers. The PVC between ER3 and ER6 was associated with UBR service 
category. In this study, the following experiment was performed: 

Experiment 3: The source rates were kept constant, and the SCRs mapped 
to different CIRs for the 4 different codepoints of the PVCs mapped corre- 
spondingly from the ER2 were varied. For each of the variations, the SCRs 
of PVC3 was 75% of the SCR for PVC2, SCR of PVC4 was 50% of PVC2 
and SCR of PVC5 was 25% of PVC2. The throughput for the 4 PVCs were 
verified to be relative to each other. 
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Fig. 3. Results of Experiment 1. 



5 Simulation Results 



For the first set of experiments explained in Section 4, the importance is laid on 
the behavior of real-time applications in AF mapped to the rt-VBR category. 
Therefore, the following results pertain to the total achieved rate, average delay, 
and average jitter obtained from the sources SI through S4 which are aggregated 
to a single code point on router ERl. 

FiguresEla),|31b), andEfa) display the results of Experiment 1. In each of the 
graphs, the behavior of the network without ATM and with ATM were studied. 
The expected performance of the network here is that guaranteed service be 
provided if the source behaves as requested. Since, the source lays a stringent 
requirement on service, non-adherence to service must be treated strictly by 
policing out excess traffic. The graphs clearly display that UBR neither maintains 
the consistency in delay and jitter nor does it police traffic entering beyond the 
requested rate. The network consists of TCP and CBR sources. Allowing excess 
traffic for the CBR sources causes recession in bandwidth. TCP sources depend 
on the feedback obtained from the network and therefore excess CBR allowed 
rate causes a reduction in TCP achieved rate. The achieved rate of TCP sources 
varied between 3 and 15 Mbps when the bandwidth available after all the CBR 
sources could be accommodated was 20 Mbps in the case when the total source 
rate of CBR sources was 20 Mbps. The behavior of UBR category is undesirable 
as this leads to starvation of low priority sources in the network. In case of DS 
and VBR, we saw that the achieved TCP rates were approximately 25 Mbps 
because the traffic was policed at 15.8 Mbps (PIR/PCR). We only see about 
14.4 Mbps sustained rate because the CIR/SCR agreed was 14.4 Mbps. 

Figures 0(b) andISJa) present the results of Experiment 2. The UBR/ABR 
and the GFR scheduling mechanism used in the ATM switches is designed to 
utilize the bandwidth in the network to the fullest possible extent. The queue 
sizes in the switches are dynamically allocated until the maximum threshold 
of the system is reached. The sizes are allocated with respect to the incoming 
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Fig. 5. Results of Experiment 2. 



traffic. The delay and jitter experienced by these service categories is mainly due 
to the scheduling. In these categories, since traffic is not strictly policed beyond 
the guaranteed rates, traffic is not dropped until the maximum size is hit. In 
case of VBR and AF (higher codepoints only), the buffer sizes do not affect 
the delay and jitter due to strict policing. The figures display this behavior. 
This experiment also shows that congestion in the network and thereby multiple 
retransmissions of data, that further aggravates the condition of the network, 
can be avoided when a proper check is put to malicious resource utilization. 

Figures |SI(b), in[ a), and E[b) present the additional results of Experiment 2. 
A source rate of 5 Mbps per source is picked for the study. The real-time ap- 
plications expect that the performance remain constant with time. The graphs 
indicate the consistency. 

In the second set of experiments explained in Section 4, the relativity is basi- 
cally the measure of relative throughput to be maintained through the AF PHB 
group. In this experiment, the AF PHB group constitute sources S5 through S8. 
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Fig. 7. Experiment 3: AF relativity with VBR mapping. 



Figure 0 presents the results of Experiment 3. Comparing the results to that 
of the experiments by Rabbat et al. m, we see that the relativity obtained 
is similar. We verified the relativity feature with different sets of CIR to SCR 
values. Table 0 displays the results. The first column in the table contains the 
CIR/SCR used for the Platinum source. The CIR/SCR values of the gold, silver 
and bronze were each 25% less than the value of the next higher level as explained 
in Section 4. 



6 Conclusion 

In this paper, the importance of QoS with an emphasis on efficient interoperation 
of QoS in IP and ATM networks was discussed. A framework for the translation 
of AF PHB to the VBR service category was put forth. The experimental setup 
and the results show that VBR is a suitable category for all types of applications 
targeted by the AF PHB. It was further shown that AF relativity can be achieved 
by tuning the VBR traffic parameters. 
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CIR/SCR 
of Platinum 
(Mbps) 


Achieved Throughput 


Platinum 


Gold 


Silver 


Bronze 


5.088 


5.0 


3.8 


2.54 


1.2 


4.24 


4.16 


3.18 


2.12 


1.06 


3.392 


3.4 


2.52 


1.56 


0.81 


2.544 


2.5 


1.9 


1.26 


0.6 



Table 1. Achieved rates of the mapped AF Olympic classes 
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Abstract. We consider an SLA (Service Level Agreement) committed between 
two parties to use the guarantee of QoS provided by a QoS Enabled Network 
(QEN). QEN can provide guarantee of QoS because a Bandwidth Broker (BB) 
manages its resources based on a policy. The IETF and the DMTF have done 
extensive work in this field and have proposed a policy framework in order to 
store and manage policy and the standardization process is still going on. 
However, not much work has been done in the field of realization and 
management of SLA, which is necessary in order to utilize QoS provided by 
the network layer as perceived by the service layer. In this paper, we propose a 
new methodology of SLO templates (machine readable description of human 
readable textual SLA) to negotiate and commit SLA to utilize QoS provided by 
networks as perceived by businesses like that of a virtual leased line service. 
We also describe the implementation of our proposed SLO templates in our BB. 



1 Introduction 

The guarantee of quality of service (QoS) is one of the most important issues for 
businesses and common users to use the Internet for their mission critical applications 
[6]. The research community and standards organizations like the IETF (Internet 
Engineering Task Force) and the DMTF (Distributed Management Task Force) have 
been working to achieve this objective. In this concern, the IETF has proposed mainly 
two architectures called IntServ [4] and DiffServ [1] to provide guarantee of network 
level QoS. These architectures provide service guarantees to users by making network 
components (routers) differentiate traffic on a basis of policy, for differentiation 
between high priority and low priority traffic. For the maintenance of policy, the IETF 
and DMTF have proposed a framework called “Policy Framework” [9]. 

Part of the policy provided by the network administrator is derived from the SLA 
(Service Level Agreement) negotiated and committed between network service 
provider(s) (generally known as ISPs) and service consumer(s) (either ISPs or 
common users or both). DiffServ framework suggests that the negotiated and 
committed SLA in human readable textual form may be represented by SLOs 
(Service Level Objectives) for machine readability [13]. These SLOs consist of some 
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parameters and their values. Though this basic framework of SLOs has been defined 
but not much work has been done to use the parameters suggested by the DiffServ 
framework to create meaningful services as perceived by the users or applications. 

We propose a new concept of SLO templates. We suggest ways to use these 
templates to create services useful for users or applications. The SLO templates are 
used to negotiate and commit SLAs, which are then enforced over DiffServ 
architecture by our bandwidth broker. These templates are designed in order to satisfy 
QoS requirements perceived at the service/application level and are implementable 
over DiffServ architecture. We expect that with time and usage some of the templates 
will be discarded, a few will become useful in local environments and the others may 
become standard services. In this paper, we also describe implementation of our 
proposed SLO templates using/in our BB. 

The rest of the paper is organized as follows. Section 2, gives the current status of 
the research in the field of SLA and QoS in terms of DiffServ architecture. In this 
section, we also describe the motivation behind our proposal of SLO templates. 
Section 3, describes our proposed SLO templates. We describe the implementation of 
our proposed SLO templates in our BB in section 4 and also describe experiments 
performed. In section 5, we explain the lessons learnt during the implementation and 
experiments. We finally provide summary and conclusion in section 6. 



2 Current Status of SLA and QoS 

In this section, we briefly describe the current status of SLA and QoS strictly in terms 
of DiffServ architecture. This is because we want to focus on providing services over 
DiffServ architecture, which provides assurance of QoS and is supposed to be 
scalable at the same time. We neither intend to nor is it possible due to space 
limitation to cover these two fields in a broad sense. 



2.1 Our Motivation: SLA for Network QoS 

An SLA describes a high-level business policy and is converted into SLOs (Service 
Level Objectives) for machine readability. An SLO consists of parameters and their 
values. In DiffServ, SLO parameters are identifiable [14][15]. It is suggested that 
these parameters can be used to establish services like a Virtual Leased Line (VLL). 
The types of parameters contained in an SLO and their values may vary even to 
construct the same type of service depending on the negotiators of the SLO. On the 
basis of negotiators, we divide SLO into two types. 



User-ISP SLO: It is committed between a common user and an ISP. 



ISP-ISP SLO: It is committed and agreed upon between ISPs. We call this as Inter- 
Domain SLO because it is negotiated between ISPs on a per-domain basis. 

The above stated have to be designed separately because the service requirements 
may be different in both cases. In this paper we focus on ISP-ISP SLO only. 




400 M. Hashmani et al. 



At present, it is necessary to decide values of all of the parameters to construct a 
service. For a committed SLO, the values of these parameters are decided after 
negotiation between service providers and service consumers. Negotiating the values 
of these parameters is a very complex issue, which depends on many factors including 
but not limited to business model, service to be provided, technical limitations, etc. A 
concrete framework or study in this regard is not available. In this paper, we attempt 
to provide solution to this problem (see Sect. 3). 



2.2 Policy Framework 

One way of providing the service agreed upon in an SLO is to derive a set of policy 
rules and manage a network according to these policy rules. The policy rules do not 
necessarily consist of only the ones derived from SLO, but these may also be 
provided by many sources like network administrator etc. These policy rules can be 
stored and managed (translation, conversion, conflict detection etc) using a 
framework called Policy Framework (PF) proposed by the IETF. PF states that one 
global set of policy rules is neither flexible enough nor suitable to manage various 
policy rules. Therefore, it suggests management of policy rules at various levels of 
abstraction, namely, High-Level Business Policy, Device Independent Policy, and 
Device Dependent Policy (see Fig. 1). It might be necessary to convert high-level 
policy rules derived from SLO into device independent policy rules before being 
converted into device dependent policy rules which are finally enforced over the 
network components. 



High-Level Business Policy 




Device Independent Policy 




Device Dependent Policy 



Fig. 1. Policy Framework showing the concept of managing policy rules into layers 
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2.3 OurBB 

In 1999, an initiative was taken by CKP/NGI to provide guarantee of end-to-end QoS 
over DiffServ domain for contents business [6]. We developed a server to record and 
manage usage of network resources (bandwidth). We called this Bandwidth Broker 
(BB) as ENICOM’s BB. Our BB performs admission control and router configuration 
on the basis of provided policy (Fig. 2). Since its first appearance, our BB has been 
constantly enhanced to include many new features and accommodate new standards 
as well [7] [8]. We have performed several QoS experiments deploying our BB over 
wide area networks (WAN) in Japan. Though initially we did not design our BB to 
perform SLA negotiation, hut due to its inherent feature of performing resource 
management and admission control on the basis of policy rules, it can be easily 
enhanced to negotiate and enforce SLA. 




Fig. 2. Basic concept of our BB 



3 Our Proposal 

As we know, sufficient research has already been done to obtain guarantee of QoS 
from the network. In this regard, two models/architectures have wide popularity, i.e.. 
Integrated Services (IntServ) and Differentiated Services (DiffServ). We have chosen 
DiffServ because it is more likely to scale well. 

A well-known example of creating services over DiffServ is a service similar to a 
leased line (also called Virtual Leased Line or VLL). Due to its QoS capabilities, 
these services are supposed to be used by the critical business applications like 
contents business. Voice over IP (VoIP), video conferencing, telemedicine etc. But 
these can only be realized when service providers and consumers establish a service 
level agreement (SLA). Generally speaking, business SLAs are a combination of text 
and a set of parameters and their values. To make an SLA as machine readable, it is 
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represented by one or more Service Level Objectives (SLOs). An SLO is a set of 
parameters and their values. Fortunately, in case of DiffServ, some primitive 
parameters have already been identified [14][15]. But these alone are not sufficient 
and require identification of more parameters to finalize SLA. 

In the next step, the service consumers and service providers need to negotiate 
values of these parameters. This process of negotiation is very cumbersome due to 
many reasons. For example, a large number of parameters make it difficult for service 
providers and consumers to find a suitable combination, which satisfies their needs. 
Some values of parameters may not be supportable (technically or otherwise) by 
service providers. This means renegotiation may be required. 



3.1 SLO Templates 

In order, to overcome the above stated problem, we propose a new idea of SLO 
templates. Each SLO template consists of variable and constant parameters (Table 1). 
Variable parameters are the ones whose values are negotiated between service 
providers and consumers. The constant parameters are those whose values are fixed 
and are not negotiable due to reasons like poor feasibility, resources-not- available, 
technical faults etc. For example, resources-not-available may limit the values of 
delay that can be supported. 

In Table 1, we have listed those parameters that we use to create SLO templates 
based on the concept described in the above paragraph. It is not our intention to create 
an exhaustive list of parameters. Rather, our purpose is to list those parameters, which 
can be used to create SLO templates, which in turn may be used to create useful 
service for business applications. The first four parameters namely, PHB (Per Hop 
Behavior), BW (Bandwidth), BT (Burst), and DY (Delay) are directly related to 
DiffServ parameters and the agreed upon values of these parameters can be fulfilled 
using DiffServ parameters. On the other hand, the last two parameters namely, AY 
(Availability) and Cost Factor (CF) are high level parameters and the agreed upon 
values of these parameters can not be fulfilled using low-level parameters. Rather 
these are fulfilled using admission control in BB. 

Table 1. Proposed SLO templates (the parameters whose values are not specified (given as xx) 
are variable parameters and the others are constant parameters) 



Parameters 
SLO Templates 


PHB 


BW 

(Kbps) 


BT 

(KB) 


DY 

(msec) 


AY 

(%) 


CF 


Premiuml 


EF 


XX 


XX 


20 


100 


FO 


Premiuml 


EF 


XX 


XX 


30 


90 


FI 


Premium! 


EF 


XX 


XX 


40 


80 


F2 


Premium4 


EF 


XX 


XX 


50 


50 


F3 



We now briefly explain AY and CF. The AY parameter indicates the maximum 
share of a sub-service (e.g. Premiuml) in total amount of resources allocated to a 
service (e.g. Premium). For example, a 90% value of AY for Premium2 in Table 1 


































Management and Realization of SLA for Providing Network QoS 403 



indicates that at maximum Premiuml will be allocated 90% of Premium resources. 
However, a comprehensive algorithm is yet to be developed to ensure strict 
compliance to the allocated share of resources. On the other hand, parameter CF 
indicates a factor, which is multiplied with the unit cost of the resource. For example, 
if the unit cost of 1 Kbps of bandwidth with a burst size of 100 KB is C, then for a 
Premiuml user availing only one unit, the total cost is calculated as C x FO. 

3.1.1 Constant Parameters 

The constant parameters are those whose values are not negotiable for any SLO 
template. For example, in Table 1 all four SLO templates (Premiuml, Premium2, 
Premiums, and Premium 4) are designed to work only over Expedited Forwarding 
(EF) PHB of DiffServ. The other constant parameters are DY, AY and CF. These 
SLO templates are designed in such a way that Premiuml is better than Premiuml, 
Premiuml is better than Premium! and so on. For example, delay for Premiuml is 
less than Premiuml. On the other hand availability and the cost factor are greater for 
Premiuml than Premium2. For example, FO > FI > F2 > F3. 

3.1.2 Variable Parameters 

In the table placing ‘xx’ in their column indicates these parameters. The values of 
these parameters are not predetermined and are decided after negotiation. We have 
decided to use Bandwidth (BW) and Burst (BT) size as variable parameters because 
these substantially vary from one service to another. Therefore, these parameters are 
primarily used in the admission control of BB too. The value of these parameters will 
eventually determine the total cost of the service used. 



4 Implementation & Experiments 

4.1 Implementation 

Our BB has been enhanced so that it can now negotiate SLOs using our concept of 
SLO templates. We now explain the data flow and operations performed at various 
stages in our implementation. In Eig. 3, each box represents a collection of data and 
an arrow represents operation performed. The box at the tail of the arrow indicates the 
data before an operation is performed and the box at the head of an arrow represents 
the data after the operation is over. 

First of all, those SLO templates are shown to the consumer for which he/she is 
eligible (determined by policy). The consumer then performs negotiation for the 
values of the variable parameters only. This is done using HTML interface designed 
for this purpose (Eig. 4). The service provider performs conflict detection and 
consistency check and the result is a Negotiated SLO (NSLO). 

At the second step, NSLO is converted into device independent policy to be stored 
along with other policy rules. At this stage, once again consistency check and conflict 
detection against other policy rules is performed. In our present implementation, we 
perform only simple checks, for example, only one SLO for Premium service may 
exist for any service consumer at any instance in time. The consistency and conflict 
detection checks need to be enhanced as new and complex policy rules are added. For 
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Fig. 4. Data Flow Diagram Showing Data Flow From SLO Templates to Router Configuration 




Fig. 4. One page (Japanese) to negotiate variable parameters of SLO templates. Each value box 
respectively represents bandwidth, burst size, delay, PHB, availability, and time duration 
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example, one such check would be to make sure that bandwidth reserved through 
SLOs for all service-consumers must not be more than a certain % of the available 
Premium bandwidth. At stage three, converted policy is again converted into device 
dependent policy and is then enforced by performing proper router configurations. 



4.2 Experiments 

To confirm validation of our concept and its implementation, we performed 
experiments over a LAN, which is divided into three small DiffServ domains (Fig. 5). 
The main objectives are, to confirm SLO negotiation using SLO templates, to confirm 
registration of Negotiated SLO (NSLO), and to confirm the translation/conversion of 
NSLO into final router configurations. 

Each DiffServ domain (Domainl, Domain2, and Domain3) in Fig. 5 is managed by 
a single BB, namely, BBl, BB2 and BB3 respectively. BBl manages the domain of 
service consumer and BB2 manages the domain of service provider. The policy 
provided to BB2 states that BBl is not financially stable client and thus only 
Premium!, Premium! and Premiumd of all four templates can be negotiated with it. 
BBl selects one of these and negotiates values of variable parameters using HTML 
page shown in Fig. 4. Before BB2 can make a commitment, it performs conversions 
and conflict detections till an implementable router configuration is derived. 

We do not quantitatively measure traffic to check SLO compliance. But because it 
is a necessary feature and we plan to implement it as a separate module in near future. 
However, we perform simple qualitative measurement to check SLO compliance. 




Fig. 5. Three domains network testbed for experiments of SLO registration & negotiation 



4.3 Implementation Features and Lessons Learnt 

During these experiments many problems were encountered. We briefly write about 
some important implementation features and lessons learnt during our experiments. 
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Fig. 6. Portion of Information Model/Schema (Containment Hierarchy) to Store SLOs 
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Fig. 7. Portion of QoS Policy Information Model (Containment Hierarchy) 



4.3.1 SLA Information Model and Schema 

To store SLOs, we design an information model and a schema. During our 
experiments we discovered that SLO mirroring is required which was not anticipated 
at the beginning. By mirroring, we mean that a copy of NSLO may be possessed by 
service providers and consumers for verifying NSLO compliance and for accounting 
purpose. We plan to propose our information model for standardization after making 
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modifications in the light of feedback from our experiments. Due to the limitation of 
space and scope, we show only a portion of our information model in Fig. 6. 

4.3.2 Policy Framework 

We have used a subset of Core Information Model [10] and QoS Information Model 
[11] proposed in IETF with slight modifications. These modifications are necessary to 
override the limitations of direct attachment of some object classes. For these 
experiments we have used only direct attachment and do not use reusable objects. In 
this concern the scope of some attributes is modified to apply these on the attached 
object classes as well. A portion of the object classes of QoS Policy Information 
Model is shown in Fig. 7. 



5 Summary and Conclusion 

In this paper, we have proposed a new concept of SLO templates, which mainly 
consists of constant and variable parameters. We propose some SLO templates for the 
Premium service as an example and to be used to check implementation only. We 
described the implementation of these templates in our BB and described experiments 
performed using our implementation. 

The concept of SLO templates can be used to easily create many useful services for 
the mission critical applications of businesses. We expect that with long-term use and 
popularity, some of the templates will become standardized services, some will be 
discarded and the others will be employed in local use. 



6 Future Work 

The followings are the main themes related to the material presented in this paper that 
we would be focusing on in near future. More research is required before these can be 
implemented in our BB. 



6.1 Enhancement of SLA 

The SLO templates proposed in this paper are all related to EF PHB of DiffServ. This 
service is called as Premium service. In our present proposed SLO templates and their 
implementation, we do not consider selection of a route. However, it is obvious that 
QoS routing within a domain and between domains is a necessity from the point of 
view of optimum resource utilization and from the point of view of policy. We want 
to investigate the possibility of including route selection parameters in our proposed 
SLO templates and evaluate their impact using experiments. 
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6.2 Traffic Measurement for SLA Compliance 

Generally speaking, in all business activities, when and SLA is committed between 
two entities, a proof of the compliance of the SLA needs to be produced by the 
service provider. The same holds true in case of SLA for network QoS. Traffic needs 
to be measured and the results need to be provided to the service consumers for the 
compliance as well as accounting purposes. Note, however, that by traffic 
measurement we not only mean counting of packets but also detecting faulty links. 
Traffic measurement is also necessary for the service providers in order to determine 
the capacity of their future networks. 
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Abstract. Network service providers contract with network owners for 
connection rights, then offer individual users network access at a price. 
Within this hierarchy, the service provider must carefully provision and 
allocate (price) network resources (e.g. bandwidth). However, determin- 
ing the appropriate amount to provision and allocate is problematic due 
to the unpredictable nature of users and market interactions. This paper 
introduces methods for optimally provisioning and pricing differentiated 
services. These methods maximizes profit, while maintaining a low block- 
ing probability for each service class. The analytical results are validated 
using simulation under variable conditions. Furthermore, experimental 
results will demonstrate that higher profits can be obtained through 
shorter connection contracts. 



1 Introduction 

The Internet continues to evolve from its small and limited academic origins to 
a large distributed network interconnecting academic and commercial institu- 
tions. In this distributed environment, individual users rely on network service 
providers for network access j^]. Network service providers contract with network 
owners for connection rights (large amounts over long periods of time), then offer 
individual users network access (small amounts over short periods of time) at a 
price. Within this hierarchy, the service provider must carefully provision and 
allocate (price) network resources (e.g. bandwidth). However, determining the 
appropriate amount to provision and allocate is problematic due to the unpre- 
dictable nature of users and market interactions. Furthermore, provisioning and 
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allocation is more complex with Differentiated Service (DS) enabled networks, 
since multiple Quality of Service (QoS) classes exist. 

It has been demonstrated that resource pricing is an efficient mechanism for 
resource management, optimal allocations, and revenue generation [21, IH], B], i, 
0. However, the majority of these methods are not based on a market hierarchy, 
and do not consider how to provision resources. Other work has investigated 
DS resource provisioning nn, but not retail pricing. In contrast, this paper 
addresses these questions (provisioning and pricing) together within the context 
of a DS enabled network consisting of hierarchical markets p. Goals include 
maximizing profit, as well as maintaining a low blocking probability. 

The remainder of this paper is structured as follows. Section (21 describes the 
general design of the hierarchical market economy. Service provider provisioning 
and allocation strategies are presented in section El that maximize profit and 
reducing the blocking probability. In section 0, the economy is demonstrated 
under variable conditions, and the monetary advantage of shorter term service 
level agreements is presented. Finally, section 0 provides a summary of the hier- 
archical market economy and discusses some areas of future research. 

2 The Hierarchical Market Model 








retail 

market 


wholesale 

market 



Fig. 1. Example hierarchical market economy consisting of users, service providers, 
and domain brokers. 

As seen in figure 0 the network model is composed of three types of en- 
tities (users, domain brokers, and service providers) and two types of markets 
(wholesale and retail). An individual user, executing an application, requires 
bandwidth of a certain QoS class along a path. Users may start a session at 
any time, request different levels of QoS, and have varying session lengths. Fur- 
thermore, users desire immediate network access (minimal reservation delay). 
In contrast, the domain broker owns large amounts of bandwidth (or rights to 
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bandwidth) and is only interested in selling large DS connections^ The ser- 
vice provider plays a very important role in the network economy. Interacting 
with users and domain brokers, the service provider purchases bandwidth from 
domain brokers (provisioning), then re-sells smaller portions to individual users 
(allocation). Buying and selling occurs in two different types of markets: the 
wholesale market and the retail market. 

2.1 Network Resource Markets 

In our network economy, service level agreements for future DS connections are 
bought and sold in the wholesale market. These forward contracts represent large 
bandwidth amounts over long periods of time |S| . Domain brokers sell contracts 
for large DS connections, with an associated Service Level Agreement (SLA), 
across a specific network. An offer specifies the location, delivery date, class q, 
price gq, and term. The market then attempts to match a buyer with the seller 
and a forward contract is created. This is how bandwidth is currently traded 
in many on-line commodity markets, such as RateXchange and Interxion. If a 
service provider agrees to purchase a DS connection of capacity Sq, the associated 
cost is gq ■ Sq for the agreed term. 

The retail market consists of a service provider selling to individual users, 
portions of the DS connections purchased in the wholesale market. The price of 
DS bandwidth will be usage-based, where the user cost depends on the current 
price and the amount consumed. We will use prices based on slowly varying 
parameters such as Time of Day (ToD) statistics 0, m, as seen in figure |2 A 
day will be divided into T equal length periods of time, where t = 1, ...,T. To 
provide predictability, these prices (next day) are known a priori by the users 
via a price-schedule {pq,*}, where Pq^t is the price of class q bandwidth during 
the t ToD period. The bandwidth of DS connection q is sold on a first come first 
serve basis; no reservations are allowed. Assume a user requires an amount of 
bandwidth bq of service class q for the duration of their session. If the amount 
is not available at the beginning of the session, the user is considered blocked. 
However, users who can not afford bq are not considered blocked. Therefore, it 
is important to price bandwidth to maximize profit as well as maintain a low 
blocking probability. 

3 Optimally Provisioning 

and Allocating Network Resonrces 

In this section optimal provisioning and retail pricing methods are developed for 
the service provider, that will maximize profit and reduce the blocking probabil- 
ity. The profit maximization behavior of the service provider is constrained by 
both markets. To maximize profits, the service provider will seek to make the 

^ Therefore, a session is a small amount of bandwidth (appropriate for a single user 
or application) compared to a connection. 
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Fig. 2. Example Time of Day (ToD) changes in retail demand. 



difference between the total revenue and the total costs as large as possible. The 
revenue from the retail market for class q during ToD t is = Pq,t ‘ dq,t{Pq,t)- 
Where dq^t{Pq,t) is a convex function representing the aggregate retail market 
demand for service class q during ToD period t at price Pq^t- As described in 
section 12.11 the cost of service class q is given from the wholesale market as 
Cq = Qq ■ Sq. Assume the SLA term for each connection q is N consecutive ToD 
periods; therefore the supply during ToD t = is constant. From the 

revenue and cost, the profit maximization problem can be written as 



Where profit maximization is over the SLA term. Viewing this as an optimization 
problem, the first order conditions are 



The left-hand side of equation0is also referred to as the marginal revenue, which 
is the additional revenue obtained if the service provider is able to sell one more 
unit of DS bandwidth. The right-hand side of equation 0 is referred to as the 
marginal cost. This is the additional cost incurred by purchasing one more unit 
of DS bandwidth from the wholesale market. The service provider must purchase 
(provision) bandwidth from the wholesale market and price bandwidth in the 
retail market so the marginal revenue equals the marginal cost, as seen in figure 
0 If this is done, the profit is maximized and the blocking probability is zero. 




( 1 ) 
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Provisioning and Pricing 




Fig. 3. The service provider seeks the point where the marginal revenue equals the 
marginal cost. If the optimal provisioning and retail pricing occurs, the amount of 
profit is given in the shaded area. (Demand data taken from the experimental section) 



3.1 Single ToD Wholesale Provisioning and Retail Pricing 

In this section, the optimal amount of bandwidth to provision for a single class q 
for one ToD period t will be determined (the q and t subscripts will be dropped 
for brevity). Assume the aggregate retail market demand at the retail price p 
has a Cobb-Douglas form 0, 

d{p) = f3 ■ p~°‘ (3) 

Where /3 and a are constants describing the aggregate wealth and price-demand 
elasticity respectively. Price-demand elasticity rraresents the percent change in 
demand, in response to a percent change in pricqj. The larger the price-demand 
elasticity value, the more elastic the demand 0. The Cobb-Douglas demand 
curve is commonly used in economics because the elasticity is constant, unlike 
linear demand curves HH. This assumes users respond to proportional instead 
of absolute changes in price, which is more realistic. Therefore, this demand 
function is popular for empirical work. For example, the Cobb-Douglas demand 
function has been successfully used for describing Internet demand in the INDEX 
Project uni; therefore, we believe this curve is also appropriate for the retail 
market. Given the aggregate demand function, the revenue earned is, 

P ■ dip) = p- p ■ p-°‘ = [3 ■ (4) 

^ Typically elasticity is represented as a negative value, since demand and price move 
in opposite directions. However, the sign is already incorporated in the demand 
equation. 
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Alternatively, the revenue earned by the service provider can be written as, 

P ■ d{p) = ■ d{p) = /?“ • ( 5 ) 

As previously described, the marginal revenue is the first derivative of the rev- 
enue equation with respect to the demand; therefore, the marginal revenue is, 

/3i.(^l-i).[d(p)]-^ (6) 

The cost function for bandwidth is g ■ s and the marginal cost is g. From equa- 
tion |2 the service provider maximizes profit when marginal revenue equals the 
marginal cost. 

=g (7) 

Solving for d{p), the optimal amount to provision s* is 







-I —a. 






( 8 ) 



During the wholesale market auction, the service provider can use equation 0 
to determine the bid amount at the offered price g. Once the auction has closed, 
the service provider must price bandwidth for the retail market. The optimal 
retail price p* is. 




This price causes the demand (equation Ej) to equal the supply (equation EJ); 
therefore, the predicted blocking probability is zero (discussed in section l,S.,S|l . 

The validity of the derived equations can be examined at infinite and unity 
elasticity. User demand will become very elastic (a approaches oo), if there is 
a large selection of service providers (large service provider competition drives 
profits to zero). In contrast, if the service provider has a monopoly, the elastic- 
ity approaches 1 and profits increase |B| . From equations 0 and El the optimal 
revenue under these two extreme cases is as predicted. 



lim 

cn— >-oo 






1 \a-l 



lim 

q;— >- + 1 






= 0 

= /3 



( 10 ) 

( 11 ) 



3.2 Multiple ToD Wholesale Provisioning and Retail Pricing 

This section considers provisioning for a single class q (the q subscript will be 
dropped for brevity) over N consecutive ToD periods. These consecutive ToD 
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periods represent the agreed SLA term from the wholesale market. As described 
in section rm assume the aggregate retail market demand, during ToD period t 
at the retail price p, has a Cobb-Douglas form, 

dt{p)=fifp-^^ ( 12 ) 

Where /3* and at are constants describing the aggregate wealth and elasticity 
respectively for ToD period t. The aggregate wealth and elasticity can change 
from one ToD period to the next. As described in section |^3the service provider 
maximizes profit when marginal revenue equals the marginal cost. Over multiple 
ToD periods this is 





N-g 



(13) 



To determine the optimal supply s*, we must solve enuation II 111 for dt(p). How- 
ever, since the equation is non-linear, a direct solution can not be found. For this 
reason, gradient methods (e.g. Newton-Raphson) can be used to determine the 
optimal provisioning amount HS|. Due to the wholesale market auction negotia- 
tion time, this calculation can be performed off-line; therefore, convergence time 
is not critical. Once the auction has closed, the optimal price for ToD period t 
is. 



P*,t = 




(14) 



Therefore, in the multiple ToD case, the supply for each ToD period is constant, 
while the price may vary, as seen in figure® 



3.3 Retail Market Demand Estimation and Blocking Probabilities 

As described in sections rm and 13.21 determining the optimal amount of band- 
width to provision and the retail price requires knowledge of the retail demand 
curve. However, due to the dynamic nature of the retail market demand can 
change over time. Such changes may reflect ToD trends, pricing, or the intro- 
duction of new technology. For this reason, demand prediction and estimation 
will be employed H3|, where the demand curve parameters (a and (3) are esti- 
mated using previous ToD measurements. The other goal for the service provider 
is to maintain a low blocking probability. Based on the optimal provisioning and 
pricing equations given in the previous two sections, these values will result in 
supply equaling demand (as seen in figure E|) yielding a zero blocking probabil- 
ity. However, if the estimated demand is less than the actual demand, then the 
blocking probability will be greater than zero. Therefore, a zero blocking prob- 
ability depends on accurate demand estimation, which will be demonstrated in 
the next section. 
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4 Experimental Results 

In this section, the optimal provisioning and pricing techniques described in 
the section Q are investigated under variable conditions using simulation. The 
experiments simulated 6 days, where each ToD was 8 hours in duration (3 ToD 
per day). The model consisted of 200 users, a domain broker, and a service 
provider. Users had an elasticity a uniformly distributed between 1.1 and 2.75, 
and a wealth /3 uniformly distributed between 1 x 10® and 3.5 x 10®. Furthermore, 
the demand of each user bt was uniformly distributed between 0.5 Mbps and 2 
Mbps (consistent with multimedia traffic). Each day, users started their sessions 
at random times using a Poisson distribution with mean equal to the first ToD 
of that day. This distribution caused the second ToD period of each day to 
have a high utilization (simulating peak hours). Two separate experiments were 
performed. The first experiment assumed the SLA term was equal to 6 days, 
while the second experiment assumed the SLA term was equal to one ToD. 




Fig. 4. Retail provisioning, allocation, and pricing simulation results for a six day SLA. 



Figure 0 shows the provisioning, allocation, and pricing results, when the 
SLA term was 6 days. As seen in this figure, the provisioned amount was 36.2 
Mbps for the duration of the simulation, while the price per ToD varied from 
16.0 to 33.8. Prices during the second ToD of each day were high, since the 
demand was higher (peak demand). In contrast, the prices for the other ToD 
periods were low to encourage consumption. The total profit for the simulation 
was 1.54 X 10^®. The blocking probability was nonzero for ToD periods 5, 8 and 
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Fig. 5. Retail provisioning, allocation, and pricing simulation results for eighteen SLA’s 
(each term equaled one ToD). 



11. During these peak ToD periods, the predicted demand was less than the 
actual demand. 

Figure Elshows the provisioning, allocation and pricing results when the SLA 
term was one ToD (18 consecutive SLA’s were contracted). In this simulation, 
the service provider could provision and price bandwidth for each ToD period. 
The bandwidth provisioned range from 1.7 Mbps to 90.0 Mbps, while the retail 
price ranged from 16 to 45.5. The total profit was 3.39 x 10^^, over twice as 
high as the 6 day SLA. Therefore, smaller SLA terms gave the service provider 
more control (provisioning and pricing), which increased profits. Similar to the 
other experiment, the blocking probability was nonzero for four ToD periods 4, 
7, 10, and 13. Again, this indicates the predicted demand was too small for these 
periods. However, these were the first ToD periods of the day (non-peak). 

5 Conclusions 

Network services are typically provided through a hierarchical market economy. 
This paper introduced a hierarchical economy consisting of two types of markets 
(retail and wholesale) and three types of entities (service provider, domain bro- 
ker, and users). Within this market hierarchy, the service provider must carefully 
provision resources from the wholesale market and allocate resources in the retail 
market. The service provider seeks to maximize profit and maintain a low block- 
ing probability. However, achieving these objectives is problematic due to the 
unpredictable nature of the markets. This paper defined optimal buying/selling 
strategies that maximizes profit while maintaining low blocking probability per 
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DS connection. These methods rely on retail market estimation to determine the 
appropriate retail market supply and the retail market price. Simulation results 
were provided to demonstrate the optimal provisioning and retail pricing meth- 
ods presented in this paper. The service provider was able to maximize profit 
given the estimated user demand and the SLA term. Shorter SLA terms were 
shown to yield higher profits, since the service provider is able to precisely pro- 
vision based on the ToD statistics. Future work includes investigating sampling 
procedures and providing retail bandwidth guarantees. 
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Abstract. Television (T)-Commerce is not just a buzzword. Consumer and 
Business demands for Multi-Media services have led to a proliferation of 
solutions that provide a wide range of services such as digital television, 
Internet Connectivity, Hidden-Programme Placements and others. Digital Video 
Broadcasting (DVB) networks which employ the MPEG-2 compression and the 
transmission standard are used for digital television. Advancements in the 
network infrastructure as well as in the Set-Top Box technology have led to 
advanced service models which enables the user to interact actively with the 
linear presentation model as offered in a broadcasting environment. This paper 
provides a technological overview of how interactivity can be added to a DVB 
network. Various scenarios are presented with a different level of interactivity. 
Accompanied to each scenario the user and the content provider issues are also 
briefly discussed. The final part attempts to predict a vision on how interactivity 
in a television environment could look in ten years. 



1. Introduction 

In the early stages of Digital Video Broadcasting (DVB) its usage was limited to the 
playout and the transmission of compressed video/audio streams according to the 
MPEG-2 standard [1]. The user’s choice was limited to the selection of video/audio 
streams. Advancements in the network infrastructure. Set Top Box technology and 
content authoring techniques have led to advanced service models which enables the 
user to interact actively with the linear presentation model as offered in a DVB 
environment. At the same time new business models have been derived offering 
services such as 

• Internet access 

• interactive advertisement 

• personalised purchasing 

• decentralised games 

The pure one-to-many broadcasting communication model is augmented by either 
an one-to-one (single user feedback to a server) or by a many-to-many (internet) 
communication model. All the services listed above challenge both the user interface 
design and the technological aspects of the synchronisation between the video stream 
and some additional data. Since synchronisation of data with video plays a major role 
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its technical aspects will be discussed in more detail in the following chapters. 
Chapter two starts with a brief overview of the technical components of a DVB 
network. The functionality and its technical realisation of the Near Video on Demand 
(NVoD) Service are explained as an example. In chapter three the various 
synchronisation means for providing an interactive data service in conjunction with 
the video stream are discussed. The last chapter provides a vision on possible services 
and interaction scenarios the user might be faced with in the future. 



2. Early Service Models for Digital Video Broadcasting Systems 

Digital Video Broadcasting systems are built upon the MPEG-2 standard. The 
primary function of Set-Top Boxes (STB) supporting early service models was to 
receive, select and display digital-quality videos. Broadcast video streams are 
characterised by the fact that video streams are not further accessible after the 
moment they have been broadcast (limited reception model). Within the video 
streams specific signalling information in form of tables [2] [3] are multiplexed to 
provide the STB with signalling information required for synchronisation, tuning and 
program selection. This information must be available throughout the duration of the 
program. Due to the fact that anyone can enter the program at any time the signalling 
information is played out using a data carousel mechanism. 

The Near Video on Demand (NVoD) service offering the same program at time 
shifted beginnings is a typical example of a local interactivity. For this scenario films 
are played out in an overlapped fashion. The necessary signalling information for 
selecting the video stream, obtaining the start time and further information about the 
film is contained in the so-called Event Information Tables (EIT) which are broadcast 
with a repetition rate of 2 seconds. The EIT contain a so-called reference event and a 
number of time shifted events equal to the number of time shifted Video streams. The 
reference event contains information about the name of the movie and a textual 
description of its content. The time shifted event provides information about the start- 
time and the duration of the movie. Figure 1 shows a screen shot of an Electronic 
Program Guide offering interactivity by selecting different time shifted video streams. 




Fig. 1. EPG display for selecting different beginnings of a movie 
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In this model the degree of interaction for the user is limited to the selection of 
video and audio streams. The selection of a time shifted service is based upon the start 
time contained in the BIT. The user interacts actively with the user interface of the 
STB but there is no interaction based upon the content of the video stream possible. 

Network Issues 

From the network point of view the Digital Video Broadcasting network acts as a 
large serial disk for the storage of signalling information as well as the video/audio 
streams. There are no connection and log-on latencies by this kind of communication 
model. In term of the required bandwidth it can be calculated in advance and shows a 
relatively static fashion for the duration of the event. 

User Interface Issues 

In general the user selects one of the feeds to be displayed on the television at a 
certain instant of time. The design of the user interface should be easy in the sense 
that a feed selection can be done with very few key strokes e.g. using coloured 
buttons on the remote control. Also the user interface should present the various 
starting times to the viewer selecting the one which most recently had started once the 
user has tuned to that service. 



3. Interactivity by Adding Data Service to the Video Service 

3.1 Introduction 

An enhancement of the pure video selection can be achieved by adding a data stream 
to a video stream. Regardless of the representation environment or complexity of the 
data (e.g. Web-Browser/HTML-pages, Image- Viewer/Images) there must be a 
synchronisation mechanism which bundles the video stream with the additional data. 
Media convergence as described here automatically calls for media synchronisation. 
The degree of granularity of the synchronisation can vary from event based to frame 
based or even based upon an object level. Enriching and synchronising the video 
stream with data has the full potential for creating attractive consumer services 
offering new business models. Apart from the unknown social impact of such 
scenarios legal aspects are another issue which may raise further questions. Those two 
aspects will not be further investigated in this paper. 

Depending upon the service to be offered synchronisation requirements may be 
either loose which means within seconds, or tight which means frame accurate or 
even object accurate within a frame. 

An example of a loose synchronisation would be an icon that appears few seconds 
after the start of an event allowing further interaction. A loose synchronisation is 
typically event driven relaying upon the DVB signalling information mechanism and 
does not require time stamps carried within the additional data stream. Within this 
signalling information an absolute time could be carried for the synchronisation. The 
next degree of granularity concerns the frame based data synchronisation. This 
method requires the presence of time stamps which are correlated to the video stream 
time line. An example of a tight synchronisation could be links that are connected to 
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frames (scene) or even follow objects displayed on the television screen. Since the 
synchronisation aspects play a major role in providing interactivity in a DVB 
environment, three different synchronisation mechanism will be discussed in more 
detail in the following subchapters. 



3.2 Event Based Synchronisation of Data Streams 

For this kind of scenario the additional data are synchronised upon the occurrence of a 
certain event of a program. At BetaResearch a signalling protocol which enables the 
synchronisation of data with the video stream of a granularity of one second has been 
developed [4]. Specific tables are broadcast which contain trigger events (start time 
and duration) with an absolute time base. Thus synchronisation within an event is 
possible. In other words the broadcast video and data are coupled via the absolute 
time base. Figure 2 shows an example of an user interface where the pure video is 
augmented with web based data (e.g. HTML pages) that is related to the video content 
using the BetaResearch signalling protocol. Normally the video occupies the whole 
screen. The availability of additional data is indicated by an icon. If the user clicks on 
the icon the video is displayed in a smaller area of the screen, a browser window 
opens displaying related information to the video. The user can obtain further 
information of the car/driver by pressing a coloured button on the remote control. The 
additional data (e.g. HTML pages) is either broadcast (walled garden) or could be 
retrieved from a server administered by the service provider. For the walled garden 
approach the additional data is broadcast by means of a data carousel according to the 
schedule of the video stream or in advance for possible pre-processing at the STB. 
The content of the data carousel may vary during the event. 




Fig. 2. Display example of a Video-Web Synchronisation 
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Network Issues 

A scheduling system at the head end site is necessary to generate and playout the 
tables (signalling information) as well the actual data (HTML pages) according to the 
program schedule. Authoring of the video content in advance is not necessary. The 
multiplexer pre-allocates either sufficient bandwidth for the data carousel (static 
bandwidth allocation) or a bandwidth manager controls and allocates the bandwidth 
among the sources (video, data carousel). The bandwidth manager becomes necessary 
if the data carousel varies its size over the duration of the event ensuring an optimal 
utilisation of the transponder bandwidth. Figure 3 shows an example of an 
architecture of a DVB satellite network offering interactivity by means of data 
carousel and on-demand data access via the return channel. If a satellite network is 
used for the delivery of both video streams and on demand requested data (e.g. HTML 
pages) a bandwidth allocation scheme is needed to avoid possible bandwidth 
interference among the carousel data, on demand data streams and the video/audio 
broadcast streams. In [5] a bandwidth allocation scheme has been proposed which 
controls the bandwidth requirement of the response data (over-air delivery) by 
adjusting the TCP window size. 




Fig. 3. DVB-S network architecture supporting interactive data services 



Content Provider Issues 

The walled garden approach provides a more secure environment for the service and 
content provider thus easier to control than the free internet approach. This will also 
lead to more secure revenue streams. The content provider must be aware of the 
presence of a linear model (video stream) and to some degree of the interactive 
component. In order to attract the user with this kind of interactivity, a strong 
correlation of the video and the data taking into account the user’s profile should be 
ensured. For example a video showing a holiday resort might be augmented with 
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additional information covering many aspects such as evening events, tours offered or 
the temperature of the pool. Depending upon the profile of the user (hobbies, cuisine 
etc.) the additional data is filtered and presented in a way that it matches the user’s 
expectations. Thus it is possible to provide the user with the most valuable 
information that ensures highest user satisfaction. 

User Interface Issues 

The user is asked for a different way of behaviour and attention depending upon the 
screen (video or data interaction) selected. Navigation has to support the dual 
interaction. This dual screen approach (video and interactive data) challenges the user 
interface design by supporting different interaction spaces. The TV viewer enjoys the 
leisure whereas the data interactive screen asks for an active involvement by the user. 
Also the one-to-many communication model (video and carousel data) is enriched by 
an one-to-one communication model (user - server request). 

Within this chapter synchronisation mechanism based upon an event level were 
discussed. In the following chapter content based synchronisation means will be 
further discussed. 



3.3 Frame Based Synchronisation of Data 

Due to the temporal nature of video and the fact that there are 25 frames per second 
(50Hz systems) the synchronisation on a frame based level becomes a difficult task. 
In the worst case a synchronisation and an interactivity point may be defined for each 
frame. A shot which consists of series of frames or a scene consisting of hundreds of 
frames requires more relaxed synchronisation constraints. In order to enable frame 
based synchronisation with additional data two basic approaches are possible. Both 
approaches require an authoring process of the content. 

3.3.1 Utilising MPEG-2 Header Parameters 

The first approach utilises the possibility to add data in the MPEG-2 header (’’user 
data field” parameter). The MPEG-2 header of a compressed video stream is modified 
by adding a very limited amount of meta data during the authoring process. Qrhe meta 
data refers to the actual data to be synchronised with the video stream. At the STB the 
decoder could be programmed in such a way that by discovering the meta data further 
actions are being taken. 

3.3.2 Using the MPEG-2 Timeline 

The second approach outlined in [6] focuses on the MPEG-2 timing model which 
means that the data stream maintains the same time line as the MPEG-2 program 
stream. Technically speaking the data is synchronised to the MPEG-2 source. It refers 
to the same program clock reference timeline as the MPEG-2 video stream and thus is 
contained in the same Program Map Table as the video program stream. Depending 
upon the size of the data to be synchronised with the video two technical 
implementations are possible. Either the actual data with some meta data attached is 



' Advanced Cameras/Encoder equipment may allow to add meta data on line. 
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synchronised to the video as proposed in [6]. Or an extension offering more flexibility 
in terms of linking the content (frame) to different data would be to use meta data as a 
reference to the actual data synchronised with the video stream. This approach also 
offers more flexibility if the data is transferred over a network which is different than 
the video stream. If the meta data approach is used it must be ensured that the actual 
data are received and decoded in time at the STB. The necessary timing relationship 
between the video (frame) and the data is established during the authoring process 
taking place ahead of time. The outcome is a parameter duple comprising the 
Presentation Time Stamp (i.e. a pointer to the video frame) and meta data. 



Figure 4 depicts an example of synchronising two data streams with a video stream 
at tl and t2 by means of the meta data synchronisation approach using the MPEG-2 
time line. The meta data synchronised with the video stream via the presentation 
timestamps contain a link to the location of the actual data depicted in figure 4 as 
addinfo_l and addinfo_2 which could be e.g. HTML-pages. In addition further 
information relevant to the evaluation of the user’s profile may be contained. The data 
e.g. HTML pages processed at the STB have to be transmitted and cached ahead of 
time in order to guarantee frame accuracy. 



Video 

Track 



M etadata 



add info 1 



add info 2 



0 tl 



t2 




Fig. 4. Synchronisation of video and data by means of meta data 



3.4 Object Based Synchronisation of Data 

The solution with the most granularity concerns the synchronisation of data with 
objects within frames or series of frames (shots). At this point it is worthwhile to 
spend a few words on the MPEG-2 compression algorithm. Transform based 
compression algorithm like the MPEG-2 standard which decomposes images into 
square pieces (picture > Macroblocks > blocks) does not consider the image 
composition by means of objects. Due to this kind of compression the object based 
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synchronisation becomes difficult. Improved algorithm which model and compress 
the object motion rather than trying to encode arrays of blocks of pixels would be 
more appropriate. The MPEG-4 standard is an attempt to enable the possibility of 
object based synchronisation on an object level of the image. In order to synchronise 
data upon an object level in a MPEG-2 compression environment, enhanced scene 
analysis methods which are part of the authoring process need to be employed. 
Generally an object may move along a certain area of the screen for one shot or scene. 
If someone wants to synchronise the data with an object, methods of object tracking 
mechanism are needed. In [7] an object segmentation algorithm was described which 
is applied to uncompressed video allows a classification of every pixel in every frame 
of a video sequence. The results are data base entries associating marked objects with 
some action to be performed when selected by the viewer. 

Network Issues 

If the transfer of the data (meta data or actual data enabling interaction) is done on the 
same network as the video is distributed, then either multiplexing is needed or the 
data is transferred within pre-allocated stuffing packets. If the former one is used the 
Program Clock Reference needs to be recalculated. If the latter one is used, a control 
application must be developed ensuring that the data are added synchronously to the 
video stream occupying the stuffing packets. Also the occurrence of the additional 
data must be signalled in the relevant MPEG-2 tables allowing the DEMUX to trigger 
on this data. The transfer of the actual data by a different delivery network would be 
an additional possibility. 

Content Provider Issues 

The linear model of video presentation is still left intact by allowing interactivity. The 
strength of allowing interactivity based upon frame or object of the video offers new 
business opportunities. It enables to promote new business models such as in- 
program sponsorship (e.g. links providing further information might be sold), hidden 
product placement (e.g. links hidden behind objects which offer buying options might 
be sold) or personalised advertisement targeting all with the potential to create 
additional revenue streams for the content as well as service provider. An example of 
in-program sponsorship could be to sell links to further information triggered upon 
frames or objects. An example of hidden product placement could be to offer buying 
options hidden behind frames, scenes or objects. In other words it enables broadcaster 
to offer additional program sponsorship deals. If you offer a new business you 
automatically ask for methods to quantify the new service in terms of what kind of 
charging model is acceptable for all components appearing in the value chain. Apart 
from the business aspects legal issues may also arise with this kind of new service 
models, e.g. is a hidden information to an add legally treated like a real add or not 
simply because the user is asked to turn the hidden into a real one? 

User Interface Issues 

TV program-metaphor based interfaces usually require less actions than well-known 
Web based user interfaces. Once a TV user has tuned to a program he/she only has to 
watch and listen. Offering clickable icons which are content driven or clickable 
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objects the user interface needs to be adapted to content based interactive TV. Several 
user interface design questions have to be tackled e.g. 

• ”How is it indicated on the screen that there is a hidden information behind an 
object not annoying the viewer and interfering the plot too much ?” 

• ’’What is the response for the user on the screen once the user has selected an 
icon ?” 

The degree of disturbance the user is willing to accept for the added value is 
another question. A user study is necessary to evaluate both feature usage and overall 
experience in order to answer those and many more questions. 

Since prototypes of STB and head end systems supporting event and frame 
synchronised data services are already available interactive advertisement or hidden 
program placement services will hit the market within the next two years. In the 
following chapter an attempt has been made to predict the evolution of interactive 
services in a digital video environment. 



4. Future Interactive Service Scenarios 

4.1 Intelligent Agents 

While the business moves towards globalisation the interactivity space will gravitate 
towards unification considering the personal profile of the user/viewer. The huge 
amount of redundancy of content conveyed to the user (content consumer) will be 
filtered by intelligent agents and present the relevant interactivity space. The agent 
will learn about the users’ preferences which are stored in a data base and updated in 
an adaptive way. Together with the accompanied meta data the degree and the space 
of interactivity are tailored to the user. Voice and fingerprint recognition or even other 
means will be used for authentification. 

This service model still relies upon the linear limited reception model for the video 
stream but the interactive data related service offers a non linear component. The one- 
to-many communication model will be augmented by an one-to-one or even by a 
many-to-many communication model, e.g. every user’s hard disk may be accessed by 
every other users. This certainly requires more powerful STB, hut as the processor 
power doubles every two years such STB may appear on the market within the next 
three to five years. In addition to this video technologies will be developed which will 
partially replace the linear reception model with a non linear presentation model. For 
example the viewer gets the option to compose and play a movie according to his/her 
favourites still having a plot. The services adapt to the action of the viewers. A certain 
degree of responsiveness of the viewer is taken into account for the composition of 
the program or interaction space for the viewer. 



4.2 Meaning of Audio Visual Information 

The increasingly pervasive role that audio visual sources are going to play in the 
future of our life will make it necessary to develop new forms of representing visual 
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audio information considering also e.g. the meaning of the information conveyed to 
the user. At the user’s side the mood of the viewer or the environment also needs to be 
taken into account offering a certain space of interactivity for a specific type of 
content (’’Emotional Devices”). The current and upcoming standards like the object 
based MPEG-4 standard are not suitable and need to be improved. In order to realise 
such services the content must be augmented with physical and psychological 
information which have a reflection on the way human beings’ behave, e.g. depending 
upon the mood, emotion and the environment of the viewer, movies containing scenes 
and a plot that fits best to the viewer will be for selection at the program guide. 
Summarising it, new audio visual presentation means affecting both the content 
production phase as well as the user’ s interaction behaviour have to be researched in 
the long term range. This kind of service may reach the customer in about six to ten 
years. 
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Abstract. In this paper we analyze a wireless channel model which is subject to 
service interruption either because the channel is not available or the server is 
serving to the other users. The wireless channel is modeled by a Markov chain 
with two states corresponding to high and low error states, respectively. It is 
assumed that the channel is slotted in time and no transmission is possible 
during the high-error state. The traffic generated by a mobile user is modeled as 
the superposition of independent two states On-O/f Markov sources, which are 
statistically independent of the server. A source generates packets only during 
an On state each of which fits to a channel slot. A discrete-time queuing 
analysis derives the probability generating function (PGE) of the queue length 
under the assumption of an infinite buffer. From the PGE, mean queue length, 
mean delay and approximate finite buffer overflow probabilities are calculated. 



1 Introduction 

Our study is motivated by packet transmission over wireless channels. The stated 
goal of wireless networks is ubiquitous communication, i.e., digital connectivity of 
any type, at anytime and anywhere. First and second generation wireless networks 
currently provide support for circuit- switched voice services as well as low-rate 
circuit-switched and packet-switched data services. This has led to demands for 
broadband services and the design of third-generation wireless systems, e.g., 
IMT2000. Wireless ATM is viewed as a general solution for the third- generation 
wireless networks capable of supporting multimedia [1] and it is a direct result of the 
success of ATM on wired network. ATM was originally designed for bandwidth rich 
and nearly error-free medium (optical fiber), and it effectively trades off bandwidth 
for simplicity in switching. In contrast, in the wireless environment, the radio channel 
is bandwidth-limited and the bit-error-rate (BER) is time varying, an effect of distinct 
propagation phenomena such as multi-path fading, shadowing, path loss, noise and 
interference from other users. The time-varying channel poses challenges to the 
design of wireless ATM. It is therefore important to obtain good understanding of the 
performance (e.g. loss and delay) of employing ATM over burst-error wireless 
channels. 

Early work in modeling wireless channel focused on the stochastic behavior of the 
channel at the physical layer, measured by received signal strength or bit-error-rate. 

P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 429-438, 2001 
© Springer-Verlag Berlin Heidelberg 2001 



430 M.M. Ali, X. Zhang, and J. F. Hayes 



Such physical layer models cannot he directly used to evaluate higher-layer network 
performance, such as queuing delay and loss probability. For example, a few bit errors 
within a packet, which is the basic data unit at the wireless link layer, will cause the 
entire loss of the packet. It is therefore imperative to develop packet-level wireless 
channel models, which can be used by network engineers to simulate and analyze the 
higher-layer performance of wireless networks. The most commonly used model is 
Gilbert-Elliot model, which uses a two-state Markov chain to describe the channel [2]. 
In this channel model, each state corresponds to a specific channel quality that is 
either noiseless or totally noisy, as "Good" state or "Bad" state respectively. Previous 
studies show that these Markov chains provide a good approximation in modeling the 
error behavior at the packet level in wireless fading channels. 

The system under consideration is a wireless ATM network cell that consists of a 
base station and a number of mobile users. The base station provides the interface to 
the wired ATM network, thus the wireless ATM network extends the reach of the 
wired ATM network to the mobile users. Each mobile user may generate multi-media 
traffic which will be buffered at the mobile user. It will be assumed that the arrival 
process to each user may be modeled as a superposition of mutually independent 
On/Off sources since this type of sources have been widely used to model broadband 
traffic including voice and video [3]. The two states of the binary sources are “On” 
and “Off”, respectively, with geometrically distributed sojourn times, measured in slot 
times. In the “On” state, the source generates at least one packet. This work models 
the transmission of one of the mobile users in this wireless ATM system. Since the 
wireless channel will be shared by a number of users, periods of channel 
unavailability may be due to the transmission of signals by other users as well as 
periods of fade. 

There is a significant amount of work on multiplexing in ATM. In [4], the fluid 
approximation method was applied to analyze an infinite buffer ATM multiplexer, 
which is loaded with the superposition of statistically independent and identical 
On/Off sources. A Matrix-Analytic approach is used for a discrete-time queuing 
analysis of an ATM multiplexer in [5]. In [6], a model with binary Gn/Gj^ Markov 
sources was presented and a discrete-time queuing analysis using a generating 
function approach was developed. The functional equation describing the ATM 
multiplexer derived in [6] was solved in [7] and the PGE of the queue length was 
obtained. The servers in the queuing analyses mentioned above are all assumed to be 
deterministic, which means the server is always available, corresponding to the 
situation in wired ATM network. 

The salient feature of our model is service interruptions. This is a topic that has a 
good deal of prior work both on continuous and discrete-time queuing systems with 
server interruptions [8, 9]. A discrete-time analysis of a system with a general service 
interruption process and uncorrelated arrival process has been presented in [9]. Since, 
in this paper we assume a two-state Markov model for the server, this is a special case 
of the server model in [9] corresponding to geometric On-times and geometric Off- 
times. On the other hand, we assume more complicated correlated arrival process as 
opposed to the uncorrelated arrival process in [9]. [10] studies a multiplexer operating 
in a two-state Markovian environment in which the arrival process and the availability 
of the output channel in each state is different. In that work, the arrival process and 
the server interruption process are not independent and they depend on the same two- 
state Markovian environment. In contrast, we have a more general arrival process 
resulting from m homogeneous, independent two-state Markov sources. Eurther, our 
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arrival process is independent of the server interrnption process, which is more 
appropriate to the applications environment under consideration. Thus the results of 
this work advance the state-of-the-art on discrete-time queuing systems when both 
arrival and server interruption processes are correlated. 



2 Analytical Model 

In this paper, we model a mobile user in a wireless ATM multiplexing system as a 
discrete-time queuing system with infinite queue length and a single stochastic server. 
The time-axis is divided into equal slots and packet transmission is synchronized to 
occur at the slot boundaries. It is assumed that a packet can not leave the buffer at the 
end of the slot during which it has arrived and that a packet transmission time is equal 
to one slot. 

The arrival process of the system consists of m mutually independent and identical 
binary Markov traffic sources, each alternating between On and states. We assume 
that during a slot an On source generates at least one packet, while during an Ojf slot 
the source generates no packet. State transitions of the sources are synchronized to 
occur at the slots’ boundaries. A transition from On to Ojf state occurs with 
probability l-OC at the end of a slot, thus the number of slots that the source spends in 
On state is geometrically distributed with parameter a . Similarly a transition from 
Ojf to On state occurs with probability I- fi at the end of a slot, and the number of 
slots that the source spends in state is geometrically distributed with parameter p . 
When a and p are high, packets have tendency to arrive in clusters, alternatively 
when a and p are low, then packet arrivals are more dispersed in time. 

The server is also modeled as a two-state Markov chain, which alternates between 
Good and Bad states. We assume that during a slot, if the server is in Good state, it 
will transmit a packet if there are packets in the queue, while in Bad state the server 
will not transmit any packets whether there are packets in the queue or not. A 
transition from Good to Bad state occurs with probability l-f and a transition form 
Bad to Good state occurs with probability l-c . As a result, the lengths of the Good 
and Bad periods are also geometrically distributed with parameters y and (7 , 
respectively. 

Now let us make the following definitions, 
m = number of sources in the system. 
ij^ = length of quene at the end of slot k. 

n^. = state of the server during slot kfV for Good and ‘0’ for Bad ). 

Oj, = number of (9n-sources during slot k. 

= number of packets that arrive during slot k. 
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Fig. 1. Model of a wireless ATM multiplexing system 



fj^ = number of packets generated by the j’th (?M-source during slot k. 

We assume that the f.^. 's are independent identically distributed from slot to slot 

during an On period with PGF f{z) and average / . 

The queuing system under consideration can be modeled as a discrete-time three- 
dimensional Markov chain. The state of the system is defined by the triplet 

Let Q^{z,r,y) denotes the joint probability generating function of and , 

i.e., 

Qk(z,r,y) = = . 

1 = 01=0 j =0 



where Pi^(i,l,j) = Pr(q = i,n^ = l,a^ = j) . We have determined the steady-state joint 
PGF, Q(z, r, y), using a transform analysis details of which may be found in [11]. 
Next, we present the steady-state PGF of the queue length, P(z), from 

P{z) = Q{r,z,y)\,^iy^i , 



P{z) = [(1 - (7)p(0,0,0) + ypiOXOm - 1)1 
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( 8 ) 


The probability that the system is busy is given by, 






m(\-p)f 2-y-a 
2-a-p 1-a 




( 9 ) 


and the stability condition of the system requires that p < 1 . 






Next, we present the mean queue length, N , which may be 
differentiating the equation (2) with respect to z and then substituting z = 


obtained by 
1, 


N = P'(z) |,_i= + M'(l) 

2(1-5'(1)) 




( 10 ) 



where, 
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“I J ^ — 
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( 12 ) 



^ 2md-a)d-p){a + p-\)Cff ^ m{\-p)r(l) (13) 

(2-a-pf {2-a-pf 2-a-p 

^ m{l-p)(l-r)f ^ 2(l-y)(l-cr)(y + g-l) 
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Finally, from the Little’s result, the mean packet delay in number of slots is given 

by, 

j_ N(2-a-p) 
m(l-P)f 



( 14 ) 
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As may be seen, we present completely determined closed form expressions for the 
PGF and the mean queue length. We note the mean queue length may be very easily 
calculated from (10), since the expression is in terms of the system parameters 



3 Numerical Results 

In this section we present some numerical examples regarding the results of this 
paper. The objective is to study the effect of different wireless link error 
characteristics on the behavior of the ATM multiplexing system. The different 
wireless channel characteristics are represented by the parameters y and a , which 
1-y and 1-cr are the probability of transition from Good to Bad state and the 

probability of transition from Bad to Good state in each time slot, respectively. As 
stated earlier on, in the bad state channel is not available either due to fading or server 
serving other users. It is assumed that each On source generates a single packet during 
a slot, i.e. /(z) = z . 

In Figures 2 - 4, we plot the steady-state mean of the queue length as a function of 
the number of sources, m, with mean good and bad durations as parameters, which are 
given by l/(l-y) and l/(l-(7) respectively. As may be seen, for the same number of 
sources, different mean queue lengths are obtained for different wireless link error 
rates. From Figure 2, as the mean bad duration increases from 0 to 40 slots, the 
corresponding steady-state mean of the queue length also increases. This is expected, 
since the longer the bad duration, the longer that the channel is in Bad state, during 
which the queue builds up. In Figure 3, the ratio of the means of bad and good 
durations has been kept constant. As may be seen for lower values of the number of 
sources in the system the mean queue length does not vary much with the individual 
values of the mean good and bad durations when their ratio is constant. In also Figure 
4, we present mean queue length at constant ratio of the means of bad and good 
durations but with transmission rate as a parameter. As expected, with increasing 
transmission rate the mean queue length drops. 

Figures 5 presents the mean packet delay as a function of the traffic load p with 
the number of sources in the system, m, as parameters. The results are given for the 
mean bad duration of 2 slots, mean good duration of 20 slots, and the number of 
sources, m = 2, 5, 10 and 100. As may be seen, for any given load, an increase in the 
number of sources leads to a rise in the mean packet delay. 

From Figures 2 - 5, we also note that under heavy loading, there is a sharp increase 
in the mean queue length and the mean packet delay. This is the reason why we did 
not keep that part of the curves corresponding to loads higher than 0.8. 

Figure 6 presents the approximate probabilities of buffer overflow under different 
buffer sizes. The approximate probabilities of overflow corresponds to the 
probabilities that the queue length will be greater than the chosen buffer size in the 
infinite queue length system that we have studied. These probabilities may be 
obtained by performing a Taylor series expansion of the PGF of the queue length and 
summing up the appropriate coefficients which may be simply done by using any of 
the available symbolic softwares. As expected, for any given level of traffic, the 
probability of overflow decreases as the mean bad duration decreases. Unfortunately, 
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because of the finite precision problems we could not obtain probabilities of buffer 
overflow for systems with larger buffer sizes. 




Fig. 2. Steady-state mean of the queue length versus the number of sources, m, when the mean 
good duration is 400 slots 




Fig. 3. Steady-state mean of the queue length versus the number of sources, m, for constant 
ratio of mean good to mean bad duration 
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Number of the Sources, m 

Fig. 4. Steady-state mean of the queue length versus the number of sources, m, for constant 
ratio of mean good to mean bad duration and different channel transmission rates 




Fig. 5. Mean packet delay in number of slots versus traffic load with different number of 
sources, for a given channel error condition 
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Fig. 6. Probability of buffer overflow versus buffer size 



4 Conclusion 

In this paper we have presented a discrete-time single server queuing analysis of a 
mobile user in a wireless ATM multiplexing system. The features of the model are a 
two-state Markov chain server and a correlated arrival process, consisting of the 
superposition of independent binary Markov sources. We determine the steady- state 
PGF of the queue length distribution as well as other performance measures. 
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Abstract. A decentralized multiple access technique for radio resource 
allocation for Base Stations (BSs) is presented. The proposed Distributed 
Dynamic Channel Reservation (DDCR) method applies to mutlicarrier wireless 
dynamic TDMA LANs, where self-organized BSs act as hubs offering wireless 
access to Mobile Terminals (MTs). The method is suitable for unlicensed 
wireless environment, and especially in scenaria where BSs of different 
providers are installed in a common coverage area. In the proposed approach, 
BSs are competing to access and reserve time on separate frequencies, using a 
dynamic TDMA/TDD technique. The paper introduces etiquette rules, and 
contention disciplines, whilst eliminates any hidden terminal phenomena. 
Additionally, it defines carrier selection rules, allowing BSs to select the less 
congested carrier, based on real time measurements. Finally, the paper evaluates 
and compares the performance of the contention disciplines and MAC 
techniques, through simulations. 



1 Introduction 

Wireless and mobile ATM (wmATM) as an emerging technology, is expected to 
enhance the traditional services offered by the existing cellular and wireless systems. 
Current implementations of wmATM utilize the 5GHz spectrum [1]. ETSI is 
currently developing standards for Broadband Radio Access Networks, which include 
the HIPERLAN type 2 system. This short-range variant is intended for private use as 
a wireless LAN. It will offer high-speed access (25 Mbit/s) to a variety of systems, 
including ATM. For HIPERLAN type 2, spectrum has been allocated in the 5 GHz 
range. Furthermore, the FCC has opened a 300 MHz unlicensed band at the 5GHz 
spectrum [2]. The basic feature of this band is that no company can monopolize a 
portion of it. It would therefore appear sensible to consider the use of the 5 GHz hand 
for unlicensed medium and short-range wmATM networks. On the other hand, the 
electromagnetic spectrum is limited. Thus, efficient methods for the allocation of the 
available frequency to mobile users and geographical cells remain critical. This paper 
introduces a QoS based variation of the Distributed Dynamic Channel Reservation 
(DDCR) method [3], [4]. According to the new method the BSs compete and reserve 
wireless resources in a multicarrier wmATM network, based on their QoS demands. 

P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 439-448, 2001 
© Springer-Verlag Berlin Heidelberg 2001 
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2 System Assumptions 

wmATM system may be structured or unstructured. In the former case, all BSs can 
communicate via terrestrial links and they can use a protocol to regulate their access 
to the common wireless resources. In unstructured wmATM systems, the BSs cannot 
use a protocol to coordinate their access to the wireless resources. In such a case, all 
the BSs should follow a set of rules to compete for the radio resources. BSs following 
etiquette rules are referred to as self organized BSs. An example of an unstructured 
wmATM system is illustrated on Fig. 1 . 




According to Fig. 1, different wmATM providers may install their wmATM networks 
in a common coverage area. In this high interfering environment, the installed BSs 
should first regulate their access to the shared wireless resources, and then spread the 
reserved resources to MTs. This paper assumes an unstructured system, which 
consists of multicarrier wmATM networks, where the self-organized BSs offering 
ATM wireless access to the MTs. For each wmATM network the transport 
architecture is based on a MDR mechanism, which uses a dynamic TDMA/TDD 
MAC scheme, as proposed in [5]. Other variations of dynamic TDMA approach can 
be assumed as well [6], [7], and [8]. Each MT maintains an association with one of 
the BSs, until it performs a handover. Each BS offers ATM wireless access to the 
MTs that are associated with it, and includes a MAC protocol to share the resources 
among its associated MTs. We assume multicarrier wmATM networks, where the 
available spectrum is divided into M carriers. We consider N BSs in the system, each 
of which competes for and reserves one of the M carriers at a time (i.e., single 
transmitter assumption). All the BSs and MTs are slot synchronized, and they use the 
same slot length. Each self-organized BS of a wmATM network uses a dynamic 
TDMA/TDD method to schedule downlink (BSs to MTs), and uplink (MTs to BSs) 
connections' data, during its reservation on one carrier. Thus, when BSs reserve a 
carrier, they exchange information with their associated MTs based on a single carrier 
scheduling protocol, such as MDR or MASCARA [7], which attempts to optimize the 
transmission in one carrier. On the other hand, prior to the reservation of a single 
carrier, the BSs follow the DDCR multicarrier scheduling method, in order to select 
the most befitting carrier (in terms of traffic load). Thus, DDCR attempts to schedule 
efficiently the traffic load, by treating all the slots on all carriers as a two-dimensional 
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scheduling problem (time and carrier). We assume that all the BSs use the same 
transmitted power level e.g., W, and all the MTs use the same transmitted power 
level e.g., W. Normally P^tj'Pbs. For instance Tx power can be lOOmW (20dB) for 
small indoor coverage areas, or lOOOmW for outdoor larger coverage areas (i.e., 
HIPERLAN type 2 and U-NII middle band). To sense idle carriers, a threshold of Pth 
dBm (e.g. -100 dBm) is adopted for the BSs. To cope with Turn Around Times (TAT, 
that is time required to switch from receive to transmit mode and vice versa) we 
assume that all MTs require one slot. When BSs communicate with MTs (i.e., during 
their reservations) they use one slot for TAT; otherwise BSs TAT is considered 
smaller (e.g., during competition period). Likewise, for the Switch Carrier Time 
(SCT, that is time required to switch carrier) we assume that all MTs require one slot. 
When BSs communicate with MTs they use one slot for SCT; otherwise BSs SCT is 
considered smaller (e.g., during competition periods). 



3 Etiquette Rules of DDCR 

The DDCR scheduling algorithm is presented on [3] and [4], and only some details of 
this algorithm are illustrated in this paper. To avoid congestion and interference 
conditions, DDCR separates control and data channels. Control channels are used to 
resolve contentions and to broadcast carrier status information. DDCR uses four 
special signal bursts. Burst signal is energy transmitted by BSs. They used in order to 
indicate certain conditions and to broadcast control information. Normally they use 
higher transmitting power levels than normal bursts (e.g., data slots on MAC frame), 
i.e., if BPgs W is radiated for signal bursts, then BP^^ > P^^. The signal bursts are: 

• A BS transmits the Priority Burst Signal (PBS) during priority resolution phase, 
declaring its QoS demand. 

• A BS transmits the Request Burst Signal (RBS), during the competition phase, 
declaring information such as perceived delay, and reservation period request. 

• A BS transmits the End Burst Signal (EBS) at the end of its reservation on a carrier. 

• A BS that reserves a carrier transmits the Utilisation Burst Signal (UBS). 

3.1 DDCR Channels 

According to DDCR process, once one or more interfering BS sense idle carrier the 
DDCR superframe start. This superframe consists of several channels (control and 
data), allowing BSs to solve the competition, to exchange data with their associated 
MTs, to release the carrier, and to broadcast control information. The DDCR channels 
are: a) the Priority Resolution Channel, b) the Contention Resolution Channel, c) the 
MAC Channel, d) the EBS Channel, and, e) the UBS Channel. 

Priority Resolution Channel (PR-CH). Copying with ATM QoS, each BS estimates 
its Reservation Priority (RP). In DDCR, each BS competes with interferes in order to 
reserve a carrier for time period equal to its TDMA frame time. Thus, prior to the PR- 
CH channel, assume that a BSj serves Kj ATM connections, classified as: 

• {C,, Cj, ..., Ck,^} real time connections (CBR and rtVBR) 

• {Vj, Vj, ..., Vky} non real time connections (nrtVBR) 
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where, k,,+ky=K;. Real time connections (rtC) impose Cell Transfer Delay (CTD), 
whilst non real time connections (nrtC) impose Cell Loss Ration (CLR) requirements. 
Each rtC Cj (0<i<K,^+l) insert a transfer Delay violation threshold, Dl"”. Each nrtC W. 
(0<i<Ky+l) insert a cell Loss violation threshold, L“”. The RP for the BSj is [9]: 




The PR-CH period consists of a constant number of slots. This period is further 
divided to PR-CH minislots (p-slots). Each p-slot order corresponds to a particular 
RP. Eor instance assuming a 5 p-slot granularity, the E' p-slot of the PR-CH period 
corresponds to RP<0.2, the 2”“* p-slot corresponds to 0.2<RP<0.4, and the last p-slot 
corresponds to 0.8<RP<1, as shown in Pig. 2. According to the estimated RR, the BSj 
will broadcast its PBSj during the corresponding p-slot. If Dp^^ is the duration of signal 
PBS, and Dp^ is the duration of p-slot, then Dpj,>Dpp 5 , and TAT<Dp 5 -Dpp 5 . This allows 
BSs to switch from transmit to receive mode and sense PBS broadcast on the next 
order p-slot. A backlogged BS, i.e., with low PR, sense the PBS burst of the BS 
illustrating higher PR, because the latter will broadcast its PBS using a higher order p- 
slot. Backlogged BSs select a new carrier to compete for it. 

Contention Resolution Channel (CR-CH). On the PR-CH channel we have adopted 
a dimensioning scheme (i.e., number of p-slots) to represent RPs with a certain 
regularity. Thus, it is possible for two more BS to use p-slots of the same order to 
broadcast their PBSs, even if their RPs have different values (e.g., on the 2”“* decimal 
digit of RPs). To overcome this problem we introduce the CR-CH. 

During CR-CH period each BS, survived from priority resolution phase, broadcast 
its reservation requests (through the RBS), and realizes the reservation requests of 
other interfering BSs. Reservation requests represent either current MAC frame time 
length, or mean reservation delay, or both. The CR-CH comprises of an integer, but 
not fixed, number of slots, each of which is divided to a fixed number of minislots (c- 
slots), as Pig. 2 shows. The RBS signals are transmitted on continuous c-slots. We 
introduce a granularity factor g, 0<g<l. If T is the reservation request (MAC length, 
delay, or both) in slots, then RBS will use [g*T] c-slots for its transmission. If D^^ is 
the duration of c-slot, then TAT<D<,s. This allows BS to switch from transmit to 
receive mode and sense RBS broadcast by another BS. 

Longest Job First (LJF) discipline 

According to this discipline the winner of the competition is the BS with the larger 
reservation request (i.e., having the larger MAC frame). Thus, if TED; is the number 
of slots the BSj wishes to reserve on this carrier (i.e., current time frame length), then 
the BSj transmits a RBS, of [g*(TPD,)] c-slots. The winner (survivor) BS is the one 
that broadcasts the larger RBS 

Delayed Job First (DJF) discipline 

The winner of the competition is the BS that received the highest mean delay from its 
previous reservation attempts on any carrier. Each BS, records the last reserved slot in 
any carrier, say T^,, and switches to a carrier in order to compete for it. Then it 
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calculates the T„=mean(Tm). If the BSj is involved in the competition, BSj transmits a 
RBSj signal, equal to [g*(T„)] c-slots. The mechanism is identical for the LJF and 
DJF disciplines. In the former case the RBS is proportional to the frame size, whist in 
the latter case the RBS is proportional to the received reservation delay. 




Fig. 2. BSs broadcast their reservation priorities through PBS signal bursts on the 
corresponding p-slot of PR-CH (left figure). The BSs that will survive are those broadcasting 
on the higher order p-slots, in our case during the 4th p-slot. These BSs will broadcast tbeir 
RBS signals during CR-CH (right figure). The winner is the BS that broadcasts longer RBS. 



Delayed and Longest Job First (DLJF) 

This is a combination of LJF and DLF disciplines. The winner of the competition is 
the BS that experiences the higher reservation delay, and requests the larger time 
frame. Thus, if T^ is the last reserved slot of a BSj in any carrier, and the contention 
for a carrier involves the BSj, and TFDj is the number of slots the BSj wishes to 
reserve on this carrier (i.e., current time frame length), then the BSj transmits a RBSj 
of g*(TFDjH-T„) c-slots, where T„=mean(Tgj). A backlogged BS sense the RBS burst 
of the BS illustrating higher reservation request, because the latter will broadcast an 
RBS using at least on more c-slot. Backlogged BSs select a new carrier, among the M 
candidates, to compete for it. The survivor is the BS that has completed its RBS 
transmission, switched on receive mode and sense the carrier idle. 



Medium Access Control Channel (MAC-CH). This period is used for data transfer, 
i.e., accommodates the MAC frame. It consist of 

• Frame Header Broadcast Channel (FHB-CH), Within this channel the BS broadcast 
its MAC frame slot map to its associated MTs, i.e., in which slot each MT can send 
or receive data, and when downlink, uplink and MTs contention periods start.. 

• Down Link Data Channel (DLD-CH), with variable duration, accommodating 
information sent from BS to MTs. 
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• TAT Channel (T-CH), which occupies one time slot and allows MTs or BSs to 
switch from receive to transmit mode and vice versa. 

• Up Link Data Channel (ULD-CH), with variable duration, accommodating 
information sent from MTs to BS. 

• MTs Contention Channel (MC-CH) which allows associated MTs, with no 
allocated ULD-CH slots, to request reservation slots, or accommodates association 
requests from MTs. 

• Frame Trailer Channel (FT-CH), which occupies one slot. Within this channel the 
BS broadcast the FT to the associated MTs. FT includes a visiting list of the carriers 
that the BS will visit sequentially until a successful reservation. 

End Burst Signal Channel (EBS-CH). This channel uses one slot, and it is used for 
the broadcast of the EBS signal. 

Utilisation Burst Signal Channel (UBS-CH). This channel uses one slot, and during 
this period the UBS signals are broadcast. This signal denotes the number of BSs that 
compete or use the carrier during a recent period. Each BSj for each carrier E,, 
continuously updates the value of UBSj^. The notation UBS,_^ denotes the UBS 

maintained by the BSj for carrier F, (0<r<M). UBSj, is updated according the rules 
presented on [3] and [4]. A BS, which has reserved a carrier, broadcasts a UBS, after 
a predefined period of R slots. All the BSs know that the UBS signals are broadcast 
every R slots (UBS-CH). If BSj during the UBS-CH uses a carrier F^, it broadcasts the 
UBSjj for this carrier, otherwise it receives the transmitted UBS. 




Fig. 3 DDCR superframe and its Channels for 3 interfering BSs and one available carrier. 
During PR-CH the BSs 2 and 3 use the same order p-slot to transmit their PBS, whilst BS 1 
uses lower order p-slot. The BS 1 losses the competition, whilst the BSs 2 and 3 are allowed to 
broadcast their RBS during CR-CH. The BS 3 broadcast longer RBS, and reserves the carrier. 



3.2 DDCR Process Steps 

When choosing a carrier, the BS should choose the carrier illustrating the less 
congestion. A BSj keeps its Selection Parameters (SPj) list, as follows: 
SP.=(CurrentTime-LastVisitTime^+l)/UBS. A BSj selects the carrier illustrating the 
minimum SPj, value. More details are discussed in [3] and [4]. A BSj follows the 
following steps in order to select, compete, reserve and release a carrier. 

1. The BSj forms the SP list and select the min{SP J, say c, i.e. selects the carrier F^. 

2. The BSj switches to carrier F^, listens to the F^ and if it is reserved by other BS(s) 
then it waits until it will recognises an EBS, otherwise it goes to step 3. 
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3. If the BSj receives an BBS transmitted by other BS, or if no other BSj, jAi, uses the 
carrier for two consecutive slots, the competition period starts. 

4. The BSj, based on the RP,, transmits its PBS to the matching p-slot of the PR-CH. 

5. On the completion of the PBS transmission, the BSj returns to the listening mode, 
on carrier F^. If the BSj detects the F^ busy (due to transmitted PBSs by other 
competing BSs on higher order p-slots), the BSj loses the competition for carrier 
F^. The BSj should change carrier (i.e., uses the next carrier F_^ of the visiting list) 
and goes to step 2 (assume that F^=FJ. Otherwise it goes to step 6. 

6. The BS, estimates its reservation request, and broadcasts its RBS during the CR- 
CH (according to discipline LJF, DJF or DLJF). 

7. On the completion of the RBS transmission, the BS, returns to the listening mode, 
on carrier F^. If the BS, detects the F^ busy, the BS, loses the competition for carrier 
F^ The BS, should change carrier (i.e., uses the next carrier F_^ of the visiting list) 
and goes to step 2 (assume that F^=FJ. Otherwise it goes to step 8. 

8. The BS,, as a winner, reserves the carrier, and exchanges information with its 
associated MTs during MAC-CH. 

9. The BS, during FT-CH uses the UBSj^ values (k<M), calculates the SP elements, 
produces the carrier visiting list and broadcast this list within FT. 

10. On the completion of FT broadcasting it transmits the BBS, and goes to step 1. 




Fig. 4. Comparison of the DLJF (LJF+DJF), LJF, and DJF disciplines for load class 1, for 
N=10 BSs, and varying number of available carriers. 



4 Simulation Environment and Results 

To evaluate the DDCR performance, simulations were performed using the OPNBT 
Simulator. In the simulations, each dynamic TDMA frame was assumed to contain a 
fixed number of 5 slots for FH, FT, TAT, and MC channels. The channel speed was 
set to 20Mbits/sec. Bach slot was of 54 bytes long. For the simulations, we used a 
combination of both CBR and VBR connections. For each CBR connection, a simple 
periodic ATM cell generator was used. We assumed 64Kbps CBR connections. We 
used 50 CBR sources (25 uplink and 25 downlink) per BS. On the other hand, each 
VBR source is modeled by a Discrete time Batch Markov Arrival Process. We 
consider the slot as the time unit. According to [10], the traffic load produced by a 
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VBR source can be approximated by the super-position of U equivalent ON/OFF 
minisources. For each VBR source, |4 was set to 256Kbps and was set to 128Kbps, 
where |4 and are the mean and the variance of the transmission rate, respectively. 
We considered two different load classes of VBR connections. For class 1 and 2, we 
used 20 VBR and 30 VBR connections per BS, respectively.. From the simulations is 
concluded that the delay illustrated when RR, or RC methods were used is higher than 
the delay obtained when the DDCR method was used. More particularly, for load 
class 1, the DDCR method achieves more than 10% and 30% delay improvement, 
when compared with the RC and RR disciplines, respectively. For load class 2, the 
DDCR delay improvement varies from 15% to 35%, when compared with the RC 
discipline, and from 50% to 80%, when compared with RR discipline. Fig. 4 
illustrates the delay obtained when the combination of the LJF and the DJF disciplines 
is used when only the LJF is used, and when only the DJF is used. From Fig. 4 we 
observe that the combination of the LJF and DJF causes smaller reservation delay. 
This observation was confirmed for load class 2, and for all the combinations of M 
and N, as well. Fig. 5 illustrates the fluctuation of the delay between two consecutive 
reservations, for 10 BSs, 4 available carriers, and for load 1 class. From Fig. 5 we 
conclude that the mean delay decreases with time. The final value of the mean delay, 
averaged after 1500000 slots, is about 6.7 slots, and that the delay fluctuation is 
attenuated smoothly with time, and approximates the final mean value after 25000 
slots of simulation. Thus, it is concluded that the BSs maintain a realistic view of the 
congestion on carriers after the first 25000 slots, for this simulation environment 




Fig. 5. Delay variation for load class 1, for 10 BSs, and 4 available carriers. 

Fig. 6 shows the fairness of the DDCR method. According to this figure, for a 
wmATM system consisting of 10 BSs, for M=6, and for load class 1, for DLJF 
discipline, the BSs with identical traffic demands and equal number of competitors 
will experience similar reservation delay. Let NC^ the set of BSs having K 
competitors, i.e., interferers. Then, for this installation, NC5={BS„, BSj, BS5}, 
NC5={BS2, BS,}, NC^={BS3, BS4, BSg, BSg), and NC3={BSgj. In Fig. 6 the notation 
BSj(k) denotes the number k of competitors of a BS^. 
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5 Concluding Remarks 

We have introduced QoS based competition rules for a distributed carrier reservation 
method, which applies to interfering BSs in an unlicensed multicarrier wireless ATM 
system. The DDCR mechanism is immune to topology changes (e.g., installation of 
new BS in a common area), does not increase power consumption on MTs, and 
requires no frequency preplanning. Furthermore, DDCR imposes no limit on the 
number of BSs operating in a common area. From the simulation results, we have 
observed that a combination of the LJF and DJF disciplines performs better in terms 
of reservation delay. Furthermore, the simulation results show that when combining 
DDCR process with congestion estimation the reservation delay stabilizes. Thus, the 
DDCR process could be combined with a distributed Wireless Call Admission 
Control. The latter could take into account the DDCR decisions, determine if the 
system is under heavy load, and regulate the admission policy, accordingly. 
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Fig. 6. DDCR achieves fairness on reservation delay. 
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An Adaptive Error Control Mechanism 
for Wireless ATM 



Peter R. Denz and Arne A. Nilsson 
North Carolina State University 



Abstract. In order to provide a ubiquitous telecommunications network which 
merges the concepts of wireless mobile communications and the use of ATM as 
a transport mechanism, the problem of the high and variable bit error rate of the 
wireless network must be addressed. This paper provides an adaptive error 
control coding feedback mechanism to alleviate this problem in wireless ATM 
networks. The scheme combines convolutional coding, optimal interleaving, 
puncturing, and channel state estimation to achieve an appropriate error 
correction scheme. Periodic feedback is provided to the source machine 
indicating changes in the channel state and triggering changes in the coding 
scheme. 



1. Introduction 

ATM has the potential of becoming ubiquitous on all computer platforms. Thus to 
avoid the problem of protocol conversion, we must provide ATM for wireless 
systems, as these will grow more in popularity. This wireless version of ATM has 
many problems which must be overcome before it can be a reality. The ATM 
protocol was designed with the assumption of a low-noise fiber physical medium and 
a fixed addressing structure. In a wireless network, the physical link is very noisy and 
the users are potentially mobile. Not only is the physical channel very noisy, but the 
noise level varies according to the current physical environment. In this paper we 
propose a dynamic error control coding mechanism for wireless ATM to lessen the 
detrimental effects of the high and variable noise which exists on this channel. 

The ATM architecture was designed to function over a low noise fiber channel, 
thus relaxing the need for error control inside of the network. An implementation of 
ATM over a noise-varying wireless channel, on the other hand, requires additional 
error control mechanisms to be in place. 

In general, error control can be of two forms: Automatic Repeat reQuest (ARQ), 
which involves error detection and retransmission of erroneous messages, and 
Forward Error Correction (EEC), in which redundancy is added to the data to enable 
correction of some error pattern at the receiver. We propose the addition of EEC, in 
the form of convolutional coding, such that the high bit error rate of the wireless 
channel is lessened. Furthermore, we have developed an adaptive FEC mechanism 
such that more redundancy is provided in the face of a very noisy channel and less 
redundancy is provided in the face of a less noisy channel. 

In this paper, we discuss various techniques which combine to form our adaptive 
error control coding mechanism. Section 2 discusses the Gilbert channel model and 

P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 449-458, 2001 
© Springer-Verlag Berlin Heidelberg 2001 




450 P.R. Denz and A.A. Nilsson 



parameter estimation, section 3 introduces the interleaving concept, section 4 
discusses puncturing which will he used to dynamically change the coding rate, 
section 5 overviews the coding model, and sections 6 and 7 present the adaptive error 
control coding mechanism and simulation results. 



2. Gilbert Channel Model 



We model the noisy wireless channel using the Gilbert channel model, since it models 
burst noise well. This channel model consists of a two state Markov chain. These 
states represent the good state, in which no error may occur, and the bad state, in 
which errors can occur with probability h. The state transition probabilities are shown 
in the diagram of figure 1 . The expected length of the stay, in bits, in the good and 

fi 1 

bad state, given by the geometric distribution, is — and , respectively [8]. 



G -fi 



1-a 



1-a 




Fig. 1. Gilbert Channel Model 



Given a sufficiently long error vector, we can calculate estimators CC , (5 , and h 
for the current channel state as follows. Based on the error vector, calculate a = P[l], 
b = P[1 I 1], and c = P[1 | there exists a directly preceding and a directly succeeding 
error]. Finally, the estimators can be calculated as follows: 

.s ac-b^ 

a = 

2ac — b{a + c) 



q 

P = l ^ 

l — h — a 



The derivation of these equations appears in [4]. 




An Adaptive Error Control Mechanism for Wireless ATM 45 1 



3. Interleaving 

The convolutional encoder and Viterbi decoder pair perform best when errors are 
spread apart through the bit stream, providing what is called a guard space between 
bad bits. Unfortunately bit errors on the physical medium tend to be correlated and 
occur in bursts. This severely impacts the performance of the error correction coding 
system. To alleviate this problem, we spread the error bursts apart using a mechanism 
called an interleaver. The interleaver is simply a square matrix into which the bit 
stream is read in row-wise and read out column-wise. The uninterleaving mechanism 
performs the same function to properly reorder the bits at the destination. This 
interleaving is applied after the convolutional encoder at the sender and prior to the 
Viterbi decoder at the receiver. 

The size of the interleaver matrix is an important parameter which we would like to 
choose carefully to yield optimal interleaver performance. Through simulation we 
have discovered the seemingly intuitive fact that the interleaver performance 
improvement is optimal when the interleaver matrix's side is equal to the expected 
bad burst length. This provides the maximum guard space between bad bits. 

Too small of an interleaver will cause bad bits to appear in pairs on the output of 
the interleaver since the bad burst continued to the next row of the matrix. Or, even 
worse, an extremely small interleaver may not even contain a proper mix of good and 
bad bits. Too large of an interleaver also may lead to non-optimal performance since 
multiple bad bursts may appear inside of the matrix. Also, if the matrix is excessively 
large, then the tail end of the message may not fill out the entire message, thus 
requiring a significant overhead in padding bits which will have to be transmitted. 



4. Puncturing 



Puncturing is a technique in which bits are systematically removed from the ouput of 
the convolutional encoder according to a pattern specified by the puncturing matrix. 



For instance, the matrix 



1 1 1 
1 1 0 



specifies that every sixth bit is to be deleted. At 



the receiver, the same puncturing matrix is known, and, prior to the Viterbi decoder, 
random bits are entered into the bit stream at the same locations where the bits were 
removed. This way, we are artificially adding errors to the bit stream and allowing 
the Viterbi decoder to recover from these errors. This yields a reduction in the 
amount of data which will be sent over the transmission channel. 

This puncturing technique allows us to modify the rate of the convolutional code 
while maintaining the same encoder/decoder pair. We will use puncturing to allow 
the error correction mechanism to adapt to current channel conditions [9]. 
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5. Coding Model 

We have developed a simulator based upon the model shown in figure 2. The 
simulator generates packets and encodes these packets using a (2, 1, 5) convolutional 
encoder. After encoding, the packet bit streams are punctured and interleaved 
according to the current coding scheme. These resulting bit streams are passed into a 
bursty channel simulator where errors are introduced to the data. Next, the streams 
are un-interleaved and un-punctured. Finally, the erroneous bit streams enter a 
Viterbi decoder which attempts to correct all the errors in the bit streams. 

If the decoder can correct all of the errors introduced into the message, then we 
construct the error vector representing the noise pattern seen on the channel 
(discussed in section VI), calculate estimators for the current channel state and 
dynamically change the coding scheme according to these results. If the decoder is 
unable to correct all of the errors in the packet, a retransmission is performed. 




Simulate 

Noisy 

Channel 




Fig. 2. Simulator Block Diagram 



6. Adaptive Error Control Coding Mechanism 

We would like to incorporate the aforementioned techniques into a dynamic error 
control coding mechanism which adapts to the long-term changing noise conditions of 
the wireless channel. In this mechanism we will sample the current channel noise 
levels at the destination, adjust the coding mechanism, if needed, and send a control 
packet back to the sending machine to communicate the adjustment. 

Sampling the current noise level on the channel is done at the destination machine 
upon receipt of a packet. The receiver attempts recovery of the original message by 
reversing the encoding sequence. If recovery is successful, which can be checked 
simply through a CRC calculation, then an error vector can be generated as follows; 
otherwise one or more uncorrectable errors have occurred and retransmission is 
necessary. 

The receiver can take the corrected message (k) and re-encode it using the same 
encoding sequence that had original been used to yield j of figure 3. Now we 
calculate j XOR r to yield the desired error vector. Finally, we perform Gilbert model 
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parameter estimation on this error vector to obtain O. and [5 . Now we have 
measurements of the expected good and bad burst lengths. 



k 




J 




Encoder 




w 






k' 



Decoder 



Fig. 3. Message Encoding/Decoding 



At each message receipt, we can calculate the expected good and bad burst lengths 
and compare these to their respective values of the last code adjustment point. If the 
percent deviation of either of these is greater than a certain threshold, then we may 
initiate a new coding method to account for the changed noise level. 

At each code adjustment point we potentially select a new optimal interleaver 
strategy and a new puncturing level. The interleaver selection comes directly from 

1 

the channel noise estimate CC , which tells us that the expected noise burst is r , 

1-a 

and thus we use this as the next interleaver matrix. 

Now we must choose an appropriate puncturing mask according to the current 
channel conditions We have an array of available puncturing matrices to choose from 
varying from a very aggressive mask which deletes one out of every four bits up to a 
mask which deletes no bits (i.e., no puncturing) and many in between these extremes. 
We perform a binary search over these available matrices, choosing a new one at each 
code adjustment point, until we find the most aggressive mask which does not yield 
uncorrectable errors. We do this by becoming more and more aggressive until we see 
our first uncorrectable error at which point we back off the matrix to the previous one 
and maintain this one until either the noise level changes or another uncorrectable 
error is encountered. 



7. Simulation Results 

To test this adaptive error control coding mechanism through simulation, we have 
constructed a noisy environment in which the noise levels (i.e., the a and (3 
parameters) change periodically as illustrated in figure 4. This noise pattern is used as 
the basis for all of the following simulated runs and yields the raw channel bit error 
rate (before any error correction) shown in figure 5. 

The graph shown in figure 6 compares the performance of the dynamic coding 
mechanism to the various cases of static coding (using various puncturing matrices), 
in terms of the probability of frame retransmission. Of course the case of static 
coding with no puncturing has a lower probability of frame retransmission than our 
new dynamic coding mechanism. However this case also puts many more bits out 
onto the line. Looking at the other curves, "Static; Punct = 1," which corresponds to 
the most 
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Dynamic Noise Pattern 




Fig. 4. Dynamic Noise Pattern 



Channel Bit Error Rate Before Error Correction 



CD 



■E O 

!5 111 

(Q 

.Q 

O 



0.035 

0.03 

0.025 

0.02 

0.015 

0.01 

0.005 

0 



1 


1 


1 


1 




1 




1 




1 








Lili iJ 






L 








1 


I 


•0 


1 


I* 










1 


II 


|t 


I 

. 








III 




rr 




h 








■ 




■ 







r^ 


CO 


00 


ID 




CM 


o 


ID 




r^ 


o 


CD 


(J) 


r^ 


00 




CM 


CM 


O 


00 


CO 


ID 


00 




00 


CO 


r^ 


ID 


CO 


ID 


o 


G) 










CM 




CD 




ID 


CO 


o 


ID 


CO 


h-* 






O 


ID 






CM 




(O 


00 


CO 




r^ 


O 1- 




O) 




CO 




O 


CM 














CM 


CM 


CO 


CO 


CO 


CO 








ID 


ID 


















Time 

















Fig. 5. Channel Bit Error Rate Before Enor Conection 



aggressive puncturing matrix ( 



1 1 
1 0 



), is too aggressive in bit removal, as the 



probability of frame retransmission shoots up to 1.0. The remaining data series; 
"Static; Punct = 2," "Static; Punct = 3," and "Static; Punct = 4"; represent the 



progressively weakening puncturing matrices. 


1 1 1 


, 


Till 




1 1 0 




1110 



11111 

11110 



respectively. 



As we see the curves for the progressively weaker 



coding schemes, the probability of frame retransmission drops, as we expect, but 
remains worse than the dynamic coding case. 
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Fig. 6. Probability of Frame Retransmission: Dynamic vs. Static Coding 

Some of these curves do not extend to the right edge of the graph. This is due to 
the fact that the simulation has ended. The simulations which entail aggressive 
puncturing matrices require more time due to all of the retransmissions that are 
necessary. The average percentage of bits on the line for the run for the dynamic 
coding method using this particular noise pattern is 96%, due to the high noise levels 
observed on this particular channel. Less noisy scenarios will, of course, yield a 
lower percentage of bits on the line. 

Figure 7 is based upon the same set of simulation runs as figure 6. Now we are 
looking at a plot of the percentage of bits that actually get transmitted onto the line. 
As expected, all the static coding schemes have a constant percentage of bits going 
out onto the line, according to the puncturing matrix in use. However, the dynamic 
coding case varies over time as we constantly switch puncturing rates. 

Figure 8 allows us to see the correlation between the channel noise levels and the 
performance of the dynamic coding scheme. Notice that during periods of high noise, 
a less aggressive matrix is used. However, during low noise periods more aggressive 
matrices are used. 

In this comparison, however, one must look at both the values of the probability of 
starting a burst as well as the probability of remaining in a burst. Small changes in 
the former actually have stronger effects on the bit error rate than changes in the 
latter. This effect makes the dynamic noise graph deceiving in that the probability of 
remaining in a burst is visually more central in the graph and varies more, whereas the 
probability of starting a burst has a much smaller deviation range. To illustrate the 
difference in the effects of these two parameters, from the geometric distribution we 
see that varying the probability of starting a burst from 0.02 to 0.05 changes the 
expected good state stream length from 50 to 20. Likewise, varying the probability of 
remaining in a burst from 0.6 to 0.4 changes the expected bad burst length from 2.5 to 
1.67. Notice here that a small change in the probability of starting a burst yields a 
very significant change in the noise level, whereas a much larger change to the 
probability of remaining in a burst yields a very small change in the noise level. 
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Fig. 8. Noise Level and Puncturing Matrix Correlation 



8. Conclusion 

In this paper, we have proposed an adaptive error correction technique which could be 
used in wireless ATM networks. A strong error correction code scheme will be 
essential to the functioning of a wireless ATM protocol due to the high bit error rate 
of the underlying wireless channel, however this scheme must also adapt itself to the 
changes in the current wireless channel conditions. We have seen the beneficial 
affects that interleaving has on the error correction capability by effectively spreading 
out burst errors and providing the necessary guard space for the Viterbi decoder. 
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Also, we have shown that the ideal interleaver matrix has a side equal to the expected 
error burst length. 

Puncturing provides a convenient method by which the rate of the convolutional 
code can be changed without changing the actual encoder/decoder pair. 

In order to provide an effective adaptive error control mechanism, we must 
somehow gauge the current noise level of the channel. Gilbert's formulas provided 
good estimates for the channel parameters. We use this parameter estimation 
technique, in conjunction with the optimal interleaver prediction scheme and 
puncturing to provide a dynamic coding mechanism which balances the probability of 
frame retransmission and the coding rate. 

The dynamic error control coding mechanism developed in this paper is not 
restricted in any way to wireless ATM systems. Wireless ATM is just one current 
system under development which would benefit through its use. The mechanism 
would also benefit any other wireless networking system which suffers from a high 
and variable noisy environment. 

Many hurdles will have to be overcome before wireless ATM can become a reality. 
Bit error rates will no longer be negligible and will require forward error correction 
coding to hide this increased bit error rate from the higher layers on the protocol 
stack. Users will no longer be fixed in one location, thus complicating the handling of 
ATM QoS guarantees. Security is an increasingly important issue that will need to be 
addressed for the new environment. Once these, and other problems are solved, 
computer networks will evolve in such a way that the underlying physical network 
will be hidden from the user; thus approaching our goal of ubiquitous and tetherless 
access to the computer network. 
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Abstract. The next generation personal communication networks are expected to 
support multimedia services and a general solution for this is viewed through wireless 
ATM. The requirements to provide real time multimedia traffic along with voice and 
data of varying priorities increase the need for QoS mechanisms in wATM. The 
problem of Admission Control is one of the most important issues in a Wireless ATM 
environment. This paper reviews significant strategies, as new algorithmic schemes for 
Call Admission Control (CAC) in Wireless ATM Networks and evaluates their results. 



1 Introduction 

Wireless ATM technology has the ability to deliver high degrees of quality of service 
(QoS) while supporting multiple traffic classes on the same transmission path [5], [7]. 
In this context, congestion control through adequate buffering is becoming 
particularly significant to minimize the probability of cell loss and cell delay when 
multiple large traffic bursts are received concurrently at a switch. 

Wireless ATM is a connection-oriented service. One significantly important area in 
Wireless ATM networks is the control of the network congestion. Its principal role is 
to protect the network and the user in order to achieve network performance 
objectives and optimize the usage of network resources [9]. Congestion control 
procedures can be classified into preventive control and reactive control. Preventive 
congestion control involves the following two procedures: call admission control 
(CAC) and bandwidth enforcement. Before a user starts transmitting over a Wireless 
ATM network, a connection has to be established which is achieved at call setup. This 
Virtual Path between the sender and the receiver involves one or more ATM switches. 
On each of these ATM switches, resources have to be allocated to the new 
connection. The resource manager of the switch among other management functions 
accept/reject the new connections or tears down old ones. If the new connection is 
accepted during a new call or a handoff procedure we have a consequent bandwidth 
and/or buffer allocation, which are released when the connection is terminated or base 
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Station is changed. The CAC deals with question of whether or not the switch can 
accept the new connection. 

Admission control ultimately decides whether to admit or reject the request to add 
a new flow or connection based upon whether the newcomer would violate delivering 
QoS for existing flows or connections [10]. Furthermore, Admission Control involves 
each node checking every request against available capacity and current QoS 
capabilities. The node admits the request only if can provide the requested QoS after 
adding the traffic corresponding to the existing wireless connections. In our paper the 
QoS issues related to the Call Admission Control in wireless ATM Networks are 
discussed analytically. Therefore, our paper is organized as follows: In the following 
an extensive overview of the problem formulation regarding the QoS in Wireless 
ATM is taking place. In section 3, the presentation of these algorithms is taking place 
and the evaluation results are discussed. Finally, the last section has the conclusions 
for the aforementioned reviewed schemes. 



2 Modeling Wireless ATM Network - Problem Formulation 

Wireless ATM networks that support multimedia services with a QoS mechanism 
provide some challenges that are also met in cable networks, such as mobility, routing 
information accuracy, scalability, interoperability and adaptability [12], [13]. 
Although, delivering hard QoS guarantees in the wireless domain is rather difficult 
since assumptions made in providing QoS guarantees in wired ATM networks do not 
always hold in their wireless extension due to large-scale mobility requirements, 
limited radio channel resources and fluctuating network conditions. 

Qualities of Service guarantees are one of the main advantages envisaged for ATM 
networks. While other networks such as IP networks do not guarantee QoS, ATM 
Networks do at the cost of higher compexity [32]. Therefore Wireless ATM networks 
should provide mobile QoS composed of three different parts: Wired QoS, Wireless 
QoS and handoff QoS. The wired QoS consists of the following basic parameters link 
delay, cell delay variation, bandwidth, cell error rate etc [24], [30]. In the wireless 
QoS the error rate is now typically some order of magnitude higher. Also, channel 
reservation and multiplexing mechanisms at the air interface strongly influence cell 
delay variation. In the Handoff QoS new parameters are introduced such as handoff 
blocking, cell loss during handoff, handoff speed. Also, some other major components 
that consist Wireless ATM are the core network wired infrastructure and the Wireless 
access link. The network infrastructure provides the necessary mobility support 
(location management and connection handoff) to the end terminals. The prefix in 
each Wireless ATM terminal address is supplied by the switch to which the terminal 
is connected. An integrated scheme proposed in radio ports acts as switches with 
single or multiple Wireless ATM interfaces. The home address remains the same 
regardless of the mobile terminal’s location. When the terminal reaches another radio 
port another network prefix is allocated to the terminal. This leads to a scheme where 
a mobile terminal maintains a virtual connection to its home switch. When the 
terminal changes location a handoff procedure arises to reroute the location update 
virtual connection to the terminal’s new foreign switch. All the above ensure a 
successful location management in a wireless ATM network. On the other hand, the 
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connection handoff is a procedure where a user’s radio link is transferred between 
radio ports in the network without any interruption for the user connection. Handoff 
minimizes interference to the users when they move through neighboring cells and 
ensures the integrity of the radio connections [11]. Handoff assists users to freely 
move and communicate beyond the limits of a Wireless specific area. The 
connections of a user terminal in a Wireless ATM network may need to be rerouted in 
cases of handoff [22]. 

Therefore the Admission Control Mechanism during a call setup or a handoff 
procedure is very important for Wireless ATM Networks [8], [19]. Call Admission 
Control (CAC) is a function commonly implemented by software in wireless ATM 
switches that determines whether to admit or reject connection requests. A connection 
request includes traffic parameters, along with either the ATM Service category, 
requested QoS Class, or the user specified QoS parameters [27], [28]. Wireless ATM 
switches use CAC to determine whether admitting the connection request at 
Permanent Virtual Connection (PVC) provisioning time or Switched Virtual 
Connection (SVC) call origination time would violate the QoS already guaranteed to 
active connections. CAC admits the request only if the network can still guarantee 
QoS for all existing connections after accepting the request [7], [27], [31]. Frequently, 
each node performs CAC for SVCs and Soft Permanent Virtual Connections (SPVCs) 
in a distributed manner for performance reasons. A centralized system may perform 
CAC for PVCs. For accepted requests, CAC determines policing and shaping 
parameters, routing decisions, and resource allocation. CAC must simple and rapid to 
achieve high SVC call establish and on the other hand CAC must be accurate to 
achieve maximum utilization while still guaranteeing QoS [25]. CAC complexity is 
related to the traffic descriptor, the switch queuing architecture and the statistical 
traffic model. In general a network uses the peak cell rate, sustainable cell rate, and 
the maximum burst size for the two types of CLP flows, as defined in the traffic 
contract to allocate the buffer, trunk and the switch resources. Pack rate allocation 
ensures that even if all sources send the worst-case, conforming cell streams, the 
network still achieves the specified Quality of Service (QoS). Similar CAC 
Algorithms using the SCR and MBS parameters also achieve the lossless multiplexing 
[26]. CAC implementations may also permit a certain amount of resource 
oversubscription in order to achieve statistical multiplex gain. CAC algorithms may 
also use a concept called equivalent capacity in an admission algorithm based upon 
combination of the PCR, SCR, and MBS. The basic Traffic Management functions of 
a Wireless ATM network is shown in the following figure. 



3 Call Admission Control Schemes and Algorithms Evaluation 

As it is mentioned earlier the CAC decides on whether a new connection is going to 
be accepted or not. This is based on the influence to the QoS that the new connection 
is going to have and to if the switch can provide the required QoS to the new 
connection [9], [11]. The main CAC schemes are classified to the peak bandwidth 
allocation and statistical allocation. The peak bandwidth allocation which is based on 
the constant bit rate (CBR) services is suitable for PCM-encoded voice, other fixed- 
rate applications, unencoded video and other very low bandwidth applications 
(telemetry). The advantage of peak bandwidth or nonstatistical allocation is that the 
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decision of accepting or not a new connection is very easy to be taken [1], [2]. If the 
sum of all existing connections plus the rate of the new connection exceeds the 
capacity of the output link then the new connection is not accepted. The disadvantage 
is also obvious. If the connections do not transmit at peak rates the output port link is 
underutilized. 




Cor 






Cell Time 



Burst Signal Connection Time constant of call time 

Duration Transfer time life-time rate fluctuations 



Fig. 1. Wireless ATM Traffic Management Functions 



The problems that arise with Statistical allocation is that the characterization of an arrival 
process and how this is shaped deep in the ATM network is difficult to be done [4]. 
Furthermore, decisions must be made on the fly and thus, may not be CPU intensive. The 
arrival process has been characterized by using Poisson processes, its discrete counterpart 
Markov modulated Bernoulli and a fluid processes. Several auto-regressive models have been 
proposed to characterize traffic due to video. The theory of self - similarity includes long - 
term correlation in the arrival process. The appropriateness for models in ATM traffic is based 
on a few parameters standardized by the ATM Forum. These parameters are the peak rate, the 
average rate, the cell delay variation for the peak rate and the maximum burst length. These are 
not adequate when it comes to bandwidth allocation. Burstiness and correlation are the two 
parameters that affect the QoS. 

The algorithmic schemes described by [33] for presentation purposes are going to be based 
on a non-blocking ATM switch where congestion takes place in the output ports. A variety of 
different call admission schemes have been proposed in the literature. Some of these schemes 
require an explicit traffic model and some only require traffic parameters such as the peak and 
average rate. In this paper we review some of these CAC schemes. The schemes have been 
classified into the following groups: 
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• Equivalent capacity 

• Heavy traffic approximation 

• Upper bounds of the cell loss probability (CLP upper bound) 

The equivalent capacity scheme is based on a single source that feeds a finite 
capacity queue. Then, the equivalent capacity of the source is the service rate of the 
queue that corresponds to a predefined cell loss. The equivalent capacity methods are 
inaccurate in some situations. Similarly to the equivalent capacity method that is 
based on the asymptotic behavior of the tail of the queue length distribution there is a 
proposed approximation for bandwidth allocation based on the same asymptotic 
behavior. The upper bounds cell loss probability is based on the average number of 
cells that arrive during a fixed interval and the maximum number of cells that arrive 
in the same fixed interval. 

This classification was based on the underlying principle that was used to develop 
the scheme. In comparing these algorithmic schemes to each other, we focus on 
system throughput and class independence. While testing the effect of the buffer size 
the cell loss probability algorithm (CLP upper bound) seems to be the less sensitive. 
When we change the required cell loss probability we find out that the least sensitive 
is the equivalent capacity scheme while the heavy traffic is the more sensitive. The 
CLP upper bound seems to be sensitive as well. The algorithms differ in handoff 
admission. The equivalent capacity scheme algorithm completely shares available 
bandwidth among all arriving handoff users while the other algorithm uses a dynamic 
measurement-based reservation scheme to ensure that handoff users of each class 
achieve the required QoS. Also the algorithm formulated applies to a two or three- 
dimensional network topology of arbitrary shape. It is assumed here that the network 
is uniform, i.e., the movement of the mobile is independent of location and direction. 
As such, the probability of handing off to or from any cell is the same as any other 
cell. The other two algorithms that were introduced in [33] differ from the previous in 
that they use dynamically adjusted reservation partitions to control the blocking 
probability profile. The third algorithm is termed independent multimedia one-step 
prediction, complete sharing variant, and the second fourth is independent multimedia 
one- step prediction, reservation variant. The mechanism used to control the relative 
call blocking probabilities is based on a performance measurement function and is 
updated periodically. 

Another Admission Control Scheme is proposed by [15] using Genetic Algorithms. 
A Genetic Algorithm (GA) starts with an initial population that consists of possible 
solutions to the optimization problem the so - called individuals, and thereafter 
generates better one population to find the optimal solution [3]. 

The simple genetic Algorithm (SGA) consists of four components: 

• Initialization 

• Evaluation of the fitness function 

• Selection 

• Genetic Operators 

The initialization is based on seeds that are selected randomly or excellent seeds. 
These seeds are from an alphabet of floating point numbers with values within 
variables upper and lower bounds. Every individual is evaluated to obtain its fitness 
function. Lor the selection purposes a normalized geometric selection is used. Two 
genetic operators, the mutation and crossover are used. Application of these operators 
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is for the individuals. Mutation introduces new information to the population while 
crossover doesn’t. This characteristic of the mutation allows it to overcome local 
optima. On the other hand, crossover involves the exchange of portions of two 
selected individuals. The genetic Algorithm is dedicated to the allocation of fair 
bandwidth shares and after that takes advantage of any available bandwidth left from 
this allocation. In order to obtain an efficient solution the proposed features are based 
on generation of then initial population, fitness function for fair bandwidth allocation 
and for any other available bandwidth and a decomposition scheme if the number of 
calls is too large. The simulation results from a video component of a multimedia call 
for this algorithm show a significant gain in the minimization of the probability of 
having a call rejected and delivery of acceptable QoS to the users. Moreover the 
adoptive nature of the algorithm gives the advantage of increasing the QoS levels of 
the existing calls when one or more calls depart from the system. 

Another algorithm regarding Call Admission Control (CAC) in a wireless ATM 
network is described in detail by [6]. This algorithm is based on a threshold-based 
mechanism in a wireless ATM environment, which decides the admission of calls 
using the Available Bit Rate (ABR) mode, standardized by the ATM Forum for 
transmission on wired ATM networks [14], [20]. The mechanism used a threshold to 
privilege handoff requests over new call requests. Similarly to the ABR, each user 
declares a Minimum Cell Rate (MCR) and a Peak Cell Rate (PCR) for each media. 
The admission controller, one for each cell in the wireless access network, is an 
explicit rate switch, implementing the third-generation ABR strategy, the Explicit 
Rate (ER), by which it is possible to indicate directly the bandwidth that can be 
assigned to the connection by means of the ER field in the Resource Management 
(RM) cells. That system is a wireless cell, belonging to a cell cluster, with a specified 
number of channels in each cell; the input traffic comprises both new calls and calls 
coming from neighboring cells in the same cluster due to handoff. According to this 
scenario scheme, users arrive in the cell from the neighboring cells with a Poisson 
distribution with different arrival rate of the requests due to handoff and arrival rate of 
the requests due to new calls. A call can be accepted only if the network can 
guarantee the minimum requirements. If, on the other hand, there are free channels, 
they are assigned, according to a proportional strategy, to the calls that before were 
transmitting with a number of channels less than the PCR. The system performance 
evaluated in two different cases with different number of channels in each cell was 
studied. It was shown that a small increase in the number of cell channels determines 
great differences in system performance. Also, it was shown that all the curves exhibit 
a staircase shape, due to the fact that small variations of the thresholds do not affect 
system behavior. As was to be expected, the new-call blocking probability decreases 
when the threshold increases, but to the expense of the handoff loss probability. The 
curves coincide when the new-call admission threshold is equal to the number of 
channels, that is, when no priority is given to active calls coming from neighboring 
cells due to handoff. Einally, it was observed that the larger the number of users, the 
worse the QoS will be in terms of both blocking and loss probabilities. Thus, the 
performance of that strategy was evaluated by analyzing the influence of the system 
parameters in a case study. It has been concluded that according to the number of 
users involved in a given cell, of the threshold to be applied to improve the call loss 
probability due to handoff, limiting the number of calls simultaneously active. 

Another scheme for the Admission Control in Wireless ATM Networks is 
proposed by [23] in order to maintain the QoS traffic requirements for potential 
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handoff calls. According to this scheme, the system decisions are based on the 
developed jitter bounds for (CBR) traffic and delay bounds for (VBR). The wireless 
ATM Network consists of BSs supporting different types of traffic sending and 
receiving packets to and from all users through the downlink and uplink channels, the 
MSC and of course the ATM backbone network. In the three ATM traffic types and 
in the classes inside the types CBR, VBR, ABR different priorities are given. More 
specifically CBR and the ABR have the highest and lowest traffic transmission priorit 
respectively. Also, inside the classes with smaller jitter and delay respectively 
tolerance take higher priority. In VBR each call is regulated by a leaky bucket (LB). 
Therefore packet transmission is directed by BS according to the preset priority. In 
the proposed CAC scheme before an admission decision is taking place during a 
handoff or a new call, the QoS performance bounds are checked. The simulation 
results shown that the proposed CAC scheme can achieve both low handoff call 
dropping rate and high resource utilization. For this CAC Scheme the deterministic 
bounds for the nonpreemptive polling process have been used for QoS provisioning. It 
is anticipated that the performance should be further improved with stochastic 
bounds. 



4 Conclusion - Brief Discussion 

Wireless communications have undergone a tremendous growth in recent years. With 
systems for mobile analog and digital cellular telephony, radio paging, and 
microwave/satellite broadcasting becoming widespread, next generation wireless 
communications systems such as wireless ATM (WATM) will be required to support 
the seamless delivery of voice, video and data with high quality. Delivering hard QoS 
guarantees in the wireless domain is rather difficult since assumptions made in 
providing QoS guarantees in wired ATM networks do not always hold in their 
wireless extension due to large-scale mobility requirements, limited radio channel 
resources and fluctuating network conditions. 

In cellular-based wireless networks, the quality of service (QoS) provisioning 
problem is more challenging due to wire-less channel fading, bit error rate (BER), and 
mobility. Fading, in addition to BER, causes packet loss and delays due to 
retransmissions. Bandwidth availability is highly un-predictable due to time and 
spatial dependencies, in addition to fading effects. During handoff, a mobile that was 
granted certain QoS guarantees could be deprived of such guarantees or even dropped 
altogether. Hence, the bandwidth allocated to a call, during setup phase, could be 
decreased significantly during the call lifetime. 

Therefore, a significant contribution of our approach to CAC is the consideration 
of two conflicting goals: to support the dynamic and transient nature of device 
mobility and QoS adaptation; and to limit the disturbance incurred by QoS 
renegotiation and handoff. This paper evaluates efficient algorithms for managing and 
controlling Wireless ATM networks, which are a key prerequisite for the successful 
deployment of this networking technology. Several research initiatives aim at 
enabling researchers and engineers to understand the traffic characteristics and their 
impact on control mechanisms through measurement, simulation and analytical 
studies. Further research work should be done and evaluated in the future as 




466 D.D. Vergados et al. 



Admission Control is still an open and important issue for the successful development 
of wireless communications and especially for the wireless ATM networks. 
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Abstract. The provision of Quality -of- Service to multimedia applica- 
tions in Internet is cnrrently a hot research topic. Many approaches, ar- 
chitectures, models and protocols have been proposed in the literatnre. 
The proposals impact on the whole system architectnre. So far, only a 
few of them have been implemented and tested. The standardization 
process is far to be completed. In this paper, we discuss and compare 
three multicast routing algorithms that have been proposed in the lit- 
erature, aiming at characterizing tree structures according to the QoS 
constraints imposed by either the traffic sources or the recipients. Two 
of those algorithms have been evaluated by means of simulations. 

Keywords: Quality-of-Service, admission control, multicast routing, per- 
formance evaluation. 



1 Introduction 

The deployment of protocols and architectures supporting Quality-of-Service 
(QoS) in IP networks is strongly demanded by the growing diffusion of multi- 
media and real-time applications for the Internet. Those protocols must provide 
multicast support to fulfil the requirements of a large part of the applications, 
characterized by a 1-to-many or many-to-many interaction schema. Many pro- 
posals have been presented in the literature, aiming at either defining QoS- 
sensitive functional architectures [lEEl, or providing possible implementations 
for the architectural modules HHI So far, however, few proposals have been 
realized and tested. 

In this work, we analyze three approaches proposed in the literature to per- 
form the admission control for QoS multicast traffic. Those approaches allow to 
characterize tree structures for the forwarding of QoS traffic, in either int-serv 
domains or access networks to diff-serv backbones. We compare the performance 
of these approaches by means of simulation techniques. The paper is structured 
as follows: in Section 2, we introduce the system model considered throughout 
the work. In Section 3, we describe the different approaches proposed in the 
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niques for end-to-end Quality-of-Service control in multi-domain IP networks” . 
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literature to the QoS-sensitive multicast routing problem, focusing in particular 
on the QoSMIC 0, QoSCBT jjj and QoSPIM 0 algorithms. In Section 4, we 
analyze the pros and cons of those three protocols, while in Section 5, we dis- 
cuss the simulation results obtained by implementing QoSMIC and QoSCBT in 
the framework of the NS-2 simulation package Some concluding remarks are 
presented in Section 6. 



2 The System Model 

A network J\f can be represented as a weighted graph G = (V, E) where the 
vertices in V correspond to the network routers and the edges in E correspond to 
the links connecting the routers. Let N = jP | be the number of routers in J\f . The 
weight of an edge e is a vector of n elements. Each element represents the quality 
of the link in terms of a different metric. Examples of metrics are the bandwidth, 
the transmission delay and the packet loss probability. The metrics may be either 
static or dynamic. The static metrics (e.g., link capacity, reliability) only depend 
on the network technology. By contrast, the dynamic metrics (e.g., bandwidth 
availability, current queueing delay) depend on the current network status. The 
static metrics do not reflect the current trafflc situation; considering them for 
the QoS routing may lead to inaccurate decisions, thus failing in providing the 
requested service level. On the other hand, the maintenance of updated estimates 
of the dynamic metrics can be costly. 

The weight of a path p between two nodes u,v G V is computed from the 
weights of the links forming p. Two weight elements compose differently, de- 
pending on whether the corresponding metric is additive (such as the delay), 
multiplicative (such as the loss probability) or concave (such as the bandwidth) 
(Z). In this work, we indicate with m(it, v) the value of the metric m along the 
path from u to v provided by the unicast routing protocol. 

An application s generating QoS traffic uses a session announcement protocol 
(e.g., sdr |Hjj) to announce the needed session information such as the transmis- 
sion start time and the multicast address of the destination group. According 
to the int-serv model | 2 |, a host h that wants to receive the s’ flow joins the 
corresponding destination group Qs'- h requests to graft the distribution tree l~s 
for Qs and it specifies the QoS it wants to receive, represented as a vector reqh 
of n elements. In this paper, we only focus on the int-serv model, because in 
the diff-serv model ^ many problems arise concerning the support to dynamic 
changes of the multicast group membership that deserve further study. 

3 Proposed QoS Multicast Routing Policies 

Several algorithms have been recently proposed in the literature, to compute tree 
structures to forward QoS multicast traffic. They differentiate in the number 
and nature of the considered metrics, and in the methods used to characterize 
the tree. The (sub)optimal tree may be searched using heuristics, according to 
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the result of NP-completeness for the problem of computing an optimal path 
considering more than one additive metric constraint |^. 

Some proposals present algorithms that exploit the knowledge of 

the network topology and resource availability, the sources, destinations and 
resource requirements of the QoS traffic, to search for a tree satisfying the QoS 
constraints. Part of the needed information may be obtained for instance from 
an underlying link-state protocol. However, the specifications of those algorithms 
usually disregard the implementation issues. Neither the system architecture nor 
the communication pattern among the involved parties are detailed. 

In this paper, we focus on algorithms which have a distributed control, pos- 
sibily derived by adding QoS-awareness to standard multicast routing protocols. 
In particular, we have studied the behaviours of the QoSMIC 0, QoSCBT jZj 
and QoSPIM |H| multicast routing protocols. All these protocols aim at charac- 
terizing a tree structure T that connects the hosts in a group Q, and is shared 
among all the sources sending traffic to Q. Let TZ be the root of T. T is such that 
yg € Q a path p exists in T from 72. to g so that, for each metric rrii, 1 < i < n, p 
satisfies mi{TZ,g) < reqg[mi] if rrii is either an additive or a multiplicative met- 
ric, or 77ii(T2, g) > reqg[mi] if rrii is a concave metric. The algorithms can support 
heterogeneous recipients, that is, recipients having different QoS requirements. 
We describe them in detail in the next sections. 



3.1 QoSMIC 

According to the QoSMIC protocol P, a new router nr joining Q sends its 
request to 72, which forwards it along T to identify a subset of the nodes in 
T as the nr candidates to graft to the tree. Several heuristics can be used to 
characterize the set of candidates; we adopted the multicast tree search policy 
with the directivity mechanism for the selection of the candidates. A node nc € 'T 
is a candidate if the quality of the unicast path connecting n^ to nr, according to 
the static metrics, is such that the QoS requested by nr can be provided and Uc 
realizes a local optimum in terms of quality-of-path, that is, it offers a path whose 
quality is equal to or better than that offered by its ancestors in the tree. Each 
node in T carries its estimated QoS towards nr in the request forwarded along 
the tree, if it is better than that currently reported, to inform its descendants 
about it. The candidates send a bid message to nr; the bid is used to evaluate 
the quality-of-path according to the dynamic metrics, and establishes a tentative 
reservation in the traversed nodes, nr waits for an interval At to receive the bids 
from all the candidates. Basing on the quality estimates carried by the received 
bids, nr chooses the router (and the branch) that offers the best QoS. nr sends to 
the chosen candidate a graft message, that establishes the resource reservation 
in the traversed nodes. The tentative reservations for the other candidates are 
removed upon a timer expiration, graft messages are periodically re-sent to 
refresh the reservations. QoSMIC can consider only one quality-of-path metric 
at a time (i.e., n = 1); it requires an underlying QoS-sensitive unicast routing 
protocol that finds “good” paths according to that metric. As a router only uses 
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the static metrics to decide whether it is a candidate, no status information 
exchange amongst nodes is needed upon join/leave events. 

3.2 QoSCBT 

QoSCBT P] can contemporarily consider multiple metrics. A host h wishing to 
join the group Q sends its request on the unicast path toward TZ, with its QoS 
requirements and the indication about whether it is only a recipient in Q or it 
may be also a source for Q. Depending on this information, the join request is 
processed differently. Each intree node maintains status information concern- 
ing the dynamic metrics measuring the local resource availability. The status is 
updated according to a soft-state approach via the periodic exchange amongst 
neighbours of control messages, that also refresh the resource reservations. An 
intree router r records: 

— for each additive metric m,a'- both the maximum value of TOq from r to 

any downstream destination, = max{ma{r, dest) \/r's downstream 

destinations}, and the maximum value of from any downstream source 
to r, Mg[ma] = max{ma{src,r) \/r's downstream sources}; 

— similar information is maintained for multiplicative metrics; 

— for a concave metric me' the local value of m^ for each downstream interface, 
and the number of downstream sources. 

A join request is catched by the first intree node r it encounters while traveling 
towards 72., and it is processed as follows: 

— for an additive metric ma- the request is accepted if Mg[ma] + ma{r,h) < 
reqh[ma] and, if h is also a source for Q, M}y[ma] + ma{h,r) < reqh[ma]- 
Similarly for multiplicative metrics; 

— for a concave metric me'- the request is accepted if rric > reqh[mr] for the r’s 
downstream interface to h. 

If the request is rejected, a notification is sent to h. Otherwise, for additive and 
multiplicative metrics the request is forwarded up to 72, with each intermediate 
router repeating the test. In case of success, 72 sends an acknowledgement to h; 
each router that receives the ack performs the appropriate resource reservation, 
and updates its state if needed. For concave metrics, if h is not a source the 
request is immediately acknowledged by the first router r performing the test, 
which allocates the requested resource. Otherwise, if r has not any other down- 
stream source, it installs a tentative state and forwards the request upstream to 
perform the resource reservation. The procedure is repeated until an ancestor is 
encountered, which already has downstream sources, or the request is rejected. 
The described mechanism works for many-to-many applications with one active 
source at a time; this way, the reservation can be shared among all the sources. 
To avoid conflicts, QoSCBT imposes that a node processes only one join request 
at a time; other requests received in the meantime stay waiting until the (n)ack 
for the previous request has been sent. 
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3.3 QoSPIM 

In |B| two techniques are proposed to extend the PIM-SM protocol US! adding 
QoS-awareness. In this work we focus in particular on the Tree Information- 
Based QoS Multicast (TIQM) protocol. TIQM may contemporarily satisfy band- 
width, delay and loss constraints. It exploits the service of an underlying link 
state protocol, that maintains updated information about the network topol- 
ogy and the resource availability. A host h that wants to join a group G, uses 
the link-state information to compute a subgraph G' Q G obtained from G by 
pruning the links that do not have enough available bandwidth. Then, h runs 
the Dijkstra’s algorithm to characterize all the paths in G' connecting it to the 
nodes in T, and having a delay lower than reqh[delay]. h chooses the minimum 
delay path. As the link state protocol, although maintaining the dynamic met- 
rics, could be temporarily inaccurate, h tests the current delay and loss metrics 
for that path, by sending a join request along the path using source routing. 
When the request arrives at an intree node, it is forwarded upstream along the 
tree up to TZ, and a local reservation is recorded if needed. Every router traversed 
by the request checks for the actual availability of the requested resources and 
performs the reservation. If the check fails, a negative ack is sent to h along the 
reverse path, to release the reserved resources. Otherwise, when the request is 
received by TZ, TZ sends an ack to h. 

4 Analysis of the Described Algorithms 

In Table ^we summarize the main characteristics of the studied protocols. The 
main drawback of QoSMIC is its dependence on a QoS-sensitive unicast routing. 
Both the QoSMIC and the QoSCBT protocols operate by checking whether 
enough resources are available along the path provided by the unicast routing 
between an intree router and the joining node. Since the unicast routing considers 
only static metric^, the consequence of that policy is that flows tend to converge 
on the paths with greater resource availability. When those paths are congested, 
no new service can be established until ongoing transmissions terminate and 
release the used resources. To some extent, QoSMIC overcomes this problem, 
thanks to its capability of testing alternative branches starting from different 
candidates. 

A critical aspect of QoSCBT is how the router status is initialized. For some 
metric (e.g., the bandwidth availability), the initial status equals the static met- 
ric. The status is updated by each accepted join request, that results in a local 
resource reservation. Hence, dynamic metrics can be locally maintained. Other 
metrics, such as the delay, dynamically vary according to the traffic conditions. 
In 0, the authors propose to initialize the router status using the information 
gathered by the join request along its path. If that information is the quality- 
of-path experimented by the request itself, it concerns the direction opposite to 
that followed by the QoS data, and it does not provide any information about 

^ Dynamic metrics are evaluated during the tree construction procedure. 
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the traffic congestion on the path towards the joining node. If, by contrast, ev- 
ery router has to measure and piggyback over the request the estimates of its 
own quality-of-path towards the joining node, this charges the routers with an 
unacceptable overhead. Moreover, not all the routers could be capable of per- 
forming those measurements. On the other hand, the inaccuracy in estimating 
the queueing delay causes the failure in providing the requested QoS. 

In QoSCBT, the lack of concurrency in managing join requests may increase 
the set-up latency observed by the recipients, particularly at the start of a session 
when the most part of the receivers join at almost the same time. Another 
QoSCBT drawback concerns its suitability limited to applications having one 
active source at a time. 

Table 1. Comparison among QoSMIC, QoSCBT and QoSPIM 





QoSMIC 


QoSCBT 


QoSPIM 


^metrics 


1 


> 1 


> 1 


unicast 


same metric 


independent 


link-state 


type of metrics 


static / dynamic 


dynamic 


dynamic 


multipath 


yes 


no 


yes 


tree 


shared unidirectional 
source-based 


shared bidirectional 


shared unidirectional 
source-based 


tentative state 


soft 


hard 


hard/soft 


sources 


concurrent 


one at a time 


concurrent 


concurrent JoinRq 


yes 


no 


yes 


reservation state 


soft 


soft 


hard/soft 


comm, complexity 


0{N) 


0{N) 


0{N'^) 



Both the QoSMIC and the QoSCBT communication overheads are 0{N). 
But, QoSCBT has a greater memory overhead because of the recording of the 
needed status information. The TIQM communication cost is 0{N^) due to 
the use of a link state protocol. Moreover, it has the greatest computation and 
memory overhead. The update of the status information must be performed 
upon either each accepted join request or each leave event. On the other hand, 
the link state information provides the routers with many alternative paths that 
can be tested until one with the appropriate characteristics is found. In fact, 
we expect that TIQM outperforms QOSMIC in the probability of successfully 
grafting requesting recipients. 

Both QoSMIC and TIQM may build either shared unidirectional trees or 
source-based trees. In the former case, additional mechanisms are needed to 
reserve resources along the unicast path from each source to the tree root. 

5 Performance Evaluation 

The above considerations are confirmed by the experimental results. We im- 
plemented the algorithms in the frame of the NS- 2 simulation package |2|. We 
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simulated a meshed network of 64 nodes, connected by optical links of 2 Mbps 
bandwidth and variable length in the range 50 to 100 Km. QoS CBR traffic orig- 
inates in the tree root; we performed our measures with different transmission 
rates. The background, best effort traffic is uniformly distributed all over the 
network and uses on average the 33% of the bandwidth resources. The packet 
size is 512B. Bandwidth reservation is ensured by using the WFQ m packet 
scheduling policy; the queue length is 20 packets. The DV-based unicast rout- 
ing uses the link capacity as the quality-of-path metric. In the following, we 
report the results concerning QoSMIC and QoSCBT; the TIQM measurements 
are currently ongoing. We simulated an interval of 60 sec. 
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Fig. 1. (a) Throughput and (b) end-to-end delay of the second group vs. offered load, 
for \g\ = 10 



We performed the first set of experiments with two QoS sources: the rate of 
the first source is 1 Mbps, while the rate of the second source varies between 0.4 
and 1.9 Mbps. Both algorithms consider the bandwidth availability as the unique 
QoS constraint. Each source forms its own group; groups partially overlap. The 
second group is created after the first one has been established. We performed 
experiments with different membership cardinalities; however, the number of 
recipients has no impact on the delivered QoS. 

Both algorithms guarantee the required bandwidth to the grafted recipients 
(figure Cl^a)). For both algorithms, the end-to-end delay (figure OK &)) increases 
when approaching the network congestion; while it drops when there is no space 
left for the best effort traffic (both source rates of 1 Mbps), thanks to the pipeline 
effect. When the second source rate increases, both algorithms successfully per- 
form access control: only the destinations that can be grafted via non overlapped 
branches join the groups. Hence, on those branches, QoS traffic competes with 
the best effort traffic only. The observed fair delay and jitter are as well compa- 
rable for the two algorithms. 
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Fig. 2. (a) Number of control messages vs. number of destinations, for different QoS 
traffic loads. (6) Estimated and measured end-to-end delay for the different destina- 
tions, with delay constraint of 7.2 msec 



By contrast, the algorithms differentiate in terms of costs. In QoSMIC, the 
need to wait for the bids sent by the candidates affects the latency observed 
by the joining nodes, if At is far greater than the round-trip delay between the 
joining node and the farest intree node. In the opposite case, the joining node 
could not receive all the bid messages, thus possibly reducing its probability of 
successfully grafting the tree. QoSMIC also has a greater communication over- 
head than QoSCBT. In figure|2Ka), we report the control messages generated to 
form the second source tree for both algorithms, with respect to the number of 
recipients. For low rate of the second source (0.6 Mbps), all the destinations are 
grafted. As QoSMIC characterizes several candidates, each of which sends its 
own bid, its overhead grows more than linearly with the number of recipients. 
On the other hand, for increasing rate, the testing of alternative branches allows 
QoSMIC to graft around 10% more recipients than QoSCBT. In the case of net- 
work congestion (offered load: 2.3 Mbps), very few intree nodes are candidates, 
and the QoSMIC overhead drops, differently from QoSCBT that has to send 
explicit rejection messages to remove the tentative reservations. 

We performed experiments with QoSCBT considering both bandwidth and 
delay constraints. A request is accepted if the estimated transmission and propa- 
gation delay over the path from 72. to the joining node is lower than the required 
threshold. Queueing delays are not considered. In figure 12(6), we show the val- 
ues of the estimated and measured end-to-end delay for each recipient, with 
only one source generating traffic at 1.6 Mbps, and uniform background traf- 
fic as before. With the considered threshold of 7.2 msec., 6 out of 20 receivers 
have not been grafted. Because of the contention with the best effort traffic and 
the consequent queueing delay, some destinations (4, 5, 12, 16) are accepted al- 
though their actual delay is greater than the threshold. The use of priority packet 
scheduling does not help in smoothing those peak queueing delays. If the delay 
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bound imposed by the application is strict, the implemented architecture must 
be enhanced by adding mechanisms to monitor the queueing delay. Anyway, the 
achieved delay profile is more homogeneous than without any delay constraint 
(figure Ola)). This reflects as well on the fair delay: the lower the delay bound 




Mbps 

(a) 




(b) 



Fig. 3. (a) Average delay vs. source rate for \Q\ — 20 and different delay constraints. 
(b) QoSCBT communication overhead vs. number of considered metrics, with source 
rate 0.6 Mbps and different group cardinalities 



is, the lower the number of grafted recipients is, and the lower the difference 
among the delays they perceive is. On the other hand, as delay is an additive 
metric, the join requests are checked by all the nodes on the path from TZ to the 
joining nodeQ, thus increasing the communication overhead (figureEI(6)) and the 
latency (of roughly 1 msec.). 

6 Concluding Remarks 

In this work, we analyze the behaviours of three QoS-sensitive multicast routing 
protocols. All the algorithms build a shared tree, and are receiver-driven, thus 
supporting heterogeneous recipients. The study of their characteristics and the 
simulation results show that a trade-off exists between the algorithm costs and 
the probability of characterizing a suitable tree. In particular, this probability 
grows with the capability of exploring multiple paths towards a given receiver. 
Possibly, source-based trees could further increase that probability, by allowing 
a better traffic distribution all over the network, at the expenses of a larger 
status maintained by the nodes. We will investigate this issue by performing 
experiments with TIQM, that supports source-based trees as well. 

All the algorithms require the modification of the current Internet routers. 
The choice of which algorithm to adopt depends on several issues, such as (i) 

Rather than only by the first encountered intree node, as in the case of bandwidth. 
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the application characteristics (e.g., one or more sources contemporarily active) 
and requirements (e.g., one or more QoS constraints), and (ii) the dependences 
on the lower layer services. 

Our future work involves the investigation of techniques to perform queueing 
delay estimation, so that QoSCBT is more accurate in fulfilling the application 
requirements, without greatly increasing the overhead. Moreover, we are trying 
to reduce the dependency of QoSMIC and QoSCBT from the unicast routing. 
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Abstract. Multistage interconnection networks are often proposed to 
establish a multiprocessor system, ATM switches, or Ethernet switches. 
Various MIN structures exist to improve the performance. This paper in- 
vestigates the buffer structures of shared and non-shared buffers in case 
of packet multicast. Shared buffers perform a dynamic buffer allocation 
but require a more complex switch management. The different behavior 
concerning uniform and non-uniform network traffic is examined. The 
simulation model copes with networks of arbitrary size, arbitrary switch- 
ing element sizes, arbitrary buffer lengths in each network stage, and an 
arbitrarily chosen network load. Additionally, arbitrary multicast traffic 
patterns can be handled. 



1 Introduction 

Multistage interconnection networks (MIN) with the Banyan property are pro- 
posed to connect a large number of processors to establish a multiprocessor 
system P] . They are also used as interconnection networks in Ethernet m and 
ATM switches 0. Such systems require high performance of the network. To 
increase the performance of a MIN, Dias and Jump 0 inserted a buffer at each 
input of the switching elements (SE) and developed an analytical model to pre- 
dict its performance. Buffers at each SE allow to store the packets of a message 
until they can be forwarded to the next stage in the network. In their model, 
Dias and Jump reduced each stage in the network to one SE of this stage so that 
it could be mapped to a Markov chain. 

^ Dietmar Tutsch is supported by the German Academic Exchange Service (DAAD) 
within the ICSI Postdoc Program 
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Jenq |Sj introduced a model with lower complexity than that of Dias and 
Jump by considering only one input port of a SE per stage to model the complete 
stage. Yoon, Lee and Liu nn extended Jenq’s model by using arbitrary buffer 
lengths in the network and arbitrary SE sizes. Atiquzzaman and Akhtar j5) and 
Zhou and Atiquzzaman m examined nonuniform traffic like hot spot traffic. 
There are few investigations on multicast routing in MINs |9li 011 2| and on the 
structure of multicast ATM switches m^. An analysis of multicasting in MINs 
is presented by Yang nni. But in contrast to the other models, this model is not 
able to deal with the backpressure mechanism to handle full buffers. 

Tutsch and Hommel I14I1JI extended Jenq’s model such that the analytical 
model additionally copes with performance analysis of a network with multicas- 
ting. Multicasting includes the two special cases of unicasting and broadcasting 
of messages. Furthermore, the performance of MINs consisting of switching ele- 
ments larger than 2x2 can be evaluated. In case of store and forward routing, a 
transient performance evaluation is available HS|. 

In this paper, a simulation model is used to investigate a shared memory 
approach in contrast to the previously mentioned network architectures. Packet 
multicast is taken into account. The paper is organized as follows. In Section 
El the architecture of a multistage interconnection network with shared buffers 
is introduced. The simulation model of such a network is developed in Section 
El This model is used to investigate the performance of shared and non-shared 
network buffers. Section El summarizes the research. 



2 MIN Architecture 

Simulation models of MINs allow a QoS (quality of service) comparison of var- 
ious network architectures. MINs of special interest are such ones that connect 
multiprocessor systems or establish ATM and Ethernet switches. These inter- 
nally clocked NxN MINs consist of cxc switches with n = log^, N stages (Figure 
Ql. Internal clocking results in synchronously operating switches. In each stage 
A: (0 < fc < n — I) of non-shared buffer networks, there is a FIFO buffer of 
size mmax{k) in front of each switch input. The packets are routed by store 
and forward routing or cut-through switching from one stage to the succeeding 
by backpressure mechanism. Multicasting is performed by copying the packets 
within the cxc switches while routing (cell replication while routing, CRWR). 
Each packet copy is sent to the desired switch output independently of the other 
copies, even if another copy is blocked. These blocked copies are sent in the 
following clock cycles. 

Networks consisting of shared buffers are established by replacing the c FIFO 
input buffers of size rrimax{k) of a cxc switch by one common buffer of size 
c • rrimaxik) (Fig. ED- This shared buffer is organized as follows: Each switch 
input owns at least buffer space to store one packet avoiding the isolation of 
inputs (see below). The remaining buffer space of c • mmax{k) — c packets is 
available to all inputs. Each input forms a FIFO input queue of packets. If an 
input receives a new packet from the previous stage that has to be stored, the 
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Fig. 1. 3-stage non-shared buffer MIN consisting of cxc SEs 

af E 

Fig. 2. 4 x4 switch consisting of a shared buffer (left) and non-shared buffer (right) 



input allocates buffer space of the commonly used buffer part if available. If 
there is no further buffer available the packet is blocked at the previous stage. 

An input with a queue of more than one packet deallocates buffer space if it 
sends a packet to the next network stage. This space is returned to the pool of 
the commonly available buffer. 

Guaranteeing at least one buffer space to each input avoids that an input 
without any buffer cannot participate in the switch routing process because it is 
not able to receive a packet that has to be forwarded. E.g. let us assume that one 
of the inputs (hot spot input) receives much more packets than the other ones. 
This input would allocate up to all of the buffers. Packets of the previous stage 
that are directed to the other inputs would be blocked at the previous stage 
even if their final destination is different from the first packet queued at the hot 
spot input. Only the hot spot input would contribute to the switch traffic and 
all other inputs would remain idle. 

Additionally, the following assumptions hold for the presented simulation 
model. However, most of these assumptions can be changed to further interesting 
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network realizations with little effort due to object oriented modeling. This can 
be performed by replacing or subclassing the desired components of the object 
oriented simulation model. The network structure is described by one class while 
the routing, queuing and various other behaviors of the model are encapsulated 
in different further classes. So the effort to change the network is minimized. 
The assumptions of the presented model are: 

— All packets have the same size (like in ATM). 

— Their destination outputs are distributed uniformly. That means every out- 
put of the network is with equal probability one of the destinations of a 
packet. 

— Conflicts between packets are solved randomly with equal probabilities. 

— Packets are removed from their destinations immediately upon arrival. 

— Routing is performed in pipeline manner. That means the routing process 
occurs in every stage in parallel. 

3 Buffer Structure Comparison 

Previously mentioned MINs are simulated for performance evaluation. Networks 
consisting of switches with shared and non-shared buffers are compared. 

The simulation model is implemented in C-|— I- [Z]. It handles most kinds of 
network structures that are based on cxc switches but is optimized to model 
MINs. The network is represented as a directed graph starting at the sources 
(network inputs) and ending at the destinations (network outputs). Packets are 
generated at the sources. Each packet is provided with a tag determining the 
destination. Due to multicasting this tag is modeled by a vector of N binary ele- 
ments, each representing a network output. The elements of the desired outputs 
are set to “true”. If the packet arrives at a cxc switch, the tag is divided into c 
subtags of equal size. Each subtag belongs to one switch output, the first (lower 
indices) subtag to the first output, etc. If a subtag contains at least one “true” 
value a copy of the packet is send to the corresponding output containing the 
subtag as the new tag. 

Keeping the amount of allocated memory as small as possible, just a rep- 
resentation of the packets, referred to as containers, are routed along the net- 
work paths. The containers are replaced by the packets at the network outputs 
allowing evaluations. Figure El gives a short sketch of the simulation model. 
So called ContainerMultiputs (CM) receive the containers and store them 
in the queues. At the first network stage, FirstContainerMultiputs (FCM) 
additionally perform the replacement of the packets by containers. So called 
Container Outputs (CO) send the containers to the next network stage. At the 
last stage, LastContainerOutputs (LCO) additionally replace the containers by 
the corresponding packets. Each operation of a switch is aligned by its Crossbar 
Manager. The clocks perform the sequencing of the parallel actions due to com- 
puter simulation. The Deadlock Manager is just needed in case of multicast and 
wormhole routing. Such a scenario, which is not subject to this paper, may cause 
deadlocks. 
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Fig. 3. Sketch of the simulation model 



Simulations are performed starting multiple simulation runs in parallel and 
using a confidence level of 0.95 and a relative error of 0.02 as termination criteria. 
The simulation is observed and managed by the tool Akaroa [S| . 

All following figures identify the results of non-shared buffer networks by a 
legend “non-shared buffer: x, total z” where x represents the buffer size of each 
FIFO buffer and z = c-x gives the overall buffer size of the switch. Shared buffer 
networks are identified by a legend “shared buffer: min v, max w, total z” where 
V represents the minimal buffer size of each input, w the maximal buffer size of 
each input, and z gives the overall buffer size of the switch. 

The figures show the average throughput and delay times of 16x16 MINs 
consisting of four stages of 2 x 2 switches. The packets are routed by cut-through 
switching. 
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First, a completely uniform network traffic is investigated: the offered load to 
all inputs is equal. Concerning multicasting, all output combinations occur with 
equal probability as the destination of a packet entering the network. Figure 
0 shows the dependence between offered load and average throughput at the 
outputs in case of uniform network traffic. Increasing the offered load form 0.01 to 




offered load 



Fig. 4. Uniform network traffic 



1.0, the network reaches congestion for an offered load greater than approx. 0.14 
due to the large number of packets caused by multicasting. 

Comparing the networks with shared and non-shared buffers of an overall 
buffer size of 8, no observably difference in throughput occurs. In case of uniform 
network traffic the buffer structure does not affect the throughput. However, 
larger buffers result in higher throughput. 

In the following, uniform traffic is replaced by merging sources sending uni- 
cast traffic and sources sending broadcast traffic. 

First, traffic established by one broadcast source is investigated. The offered 
load of this source is varied from 0.01 to 1.0. All other sources send unicast traffic 
to the network with an fixed offered load of 0.2. Figure 0 shows the throughput of 
the merged traffic at the outputs for various buffer sizes and structures. Shared 
buffers perform a higher throughput than non-shared buffers of the same size: 
buffer space is more efficiently used. On the other hand, a more efficiently used 
buffer results in larger packet queues at the switch inputs: higher delay times 
occur (Figure EJ. 

A further investigated traffic pattern is similar to the previously mentioned 
one except the fact that two broadcast sources are used. These sources may 
be located at various network inputs. Depending on their locations the first 
conflicts between their packets will occur in different network stages. E.g., if 
they are located at the first two inputs, they are already in conflict at the first 
network stage because they are located at the same switch. If they are locate at 
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Fig. 5. One broadcast source (throughput) 
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Fig. 6. One broadcast source (delay) 



the first and last input, the first crossing of their packets’ network paths occurs 
soonest at the last stage. 

Figure [0 shows the throughput taking the stage of the first conflict into 
account. A non-shared switch input buffer of 1 is chosen. The sooner the network 
paths of both broadcast sources cross, the lower is the network throughput. E.g., 
if the crossing and therefore the first conflict occurs at the first stage (stage 0), 
the output of this switch equals the output in case of just one broadcast source 
at the inputs: both sources send a packet to both outputs but no more than one 
packet can pass an output. Just one broadcast source would also result in one 
packet passing each of both outputs. 

If MINs are fed by more than one high-load source the sources should be 
placed in such a way that their paths are crossing as late as possible. 
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Fig. 7. Two broadcast sources 




offered load 

Fig. 8. First conflict at last stage 



A comparison of shared and non-shared buffer structures depending on the 
stage of the first conflict is presented in Figures El and El Figure El demonstrates 
the throughput behavior for two broadcast sources that cause their first conflict 
at the last stage. The throughput behavior is given for various buffer struc- 
tures. Additionally the dotted line allows a comparison to a source distribution 
resulting in a first conflict at the first stage. 

Broadcast sources that cause their first conflict at the first stage are evaluated 
in Figure El This figure also shows the throughput of various buffer structures. 
The dotted line determines the throughput of a source distribution resulting in 
a first conflict at the last stage. 

All figures show a higher throughput for network switches with a shared 
buffer compared to switches with non-shared buffers. However, the throughput 
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Fig. 9. First conflict at first stage 



increases from non-shared to shared buffers by only a small amount and only in 
case of network congestion. This increase is paid with a more complex switch 
hardware for managing the shared buffer. 

4 Conclusion 

Multistage interconnection networks are often proposed to establish a multipro- 
cessor system, ATM switches, or Ethernet switches. Various MIN structures exist 
to improve the performance. This paper compares the two buffer structures of 
shared and non-shared buffers in case of packet multicast. Shared buffers per- 
form a dynamic buffer allocation. At least, space for one packet is reserved for 
each input avoiding the isolation of inputs. 

In case of uniform network traffic, both buffer structures show identical be- 
havior. Non-uniform traffic causes a slightly higher network throughput if shared 
buffers are used. On the other hand, shared buffers require a more complex switch 
management. 

If the network operates at high traffic load, e.g. caused by multicasting, 
the network stage of the first crossing of high load paths influences heavily the 
network behavior. 
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Abstract. A PIM-SM-built multicast tree must be restructured/recove- 
red when the underlying unicast routing tables change. In this article we 
describe the PIM-SM recovery mechanisms and evaluate the recovery 
performance, showing its dependence on a range of network and session 
parameters. Our results show that a significant recovery performance 
improvement is possible if the multicast recovery is immediately triggered 
when the unicast routing state changes. Furthermore, our results show 
that a substantial packet loss can be caused by non-reductive, “benign” 
events in the network, such as an addition of a new link. 



1 Introduction 

Stephen Deerings Ph.D. dissertation and the ensuing work in IETF on multi- 
cast protocols were the foundation for IP multicast PHIS The subsequent 
establishment of Mbone 0 positioned IP multicast as an emerging, powerful 
IP technology supporting a range of new, primarily multimedia applications. To 
address the inherent scalability problems of this technology, “Protocol Indepen- 
dent Multicast - Sparse Mode” (PIM-SM, (3) was developed, and it is the most 
widely used multicast routing protocol today. 

PIM-SM creates and maintains unidirectional multicast trees based on ex- 
plicit Join/Prune protocol messages. These control messages are sent on a node- 
to-node basis. PIM is “protocol independent” in the sense that it is independent 
of the underlying unicast protocol — it can run on top of any unicast routing 
protocol. 

To build a multicast tree, PIM multicast routers use a mechanism called 
Reverse Path Forwarding jSj . RPF determines the direction to the root of the tree 
using the unicast routing tables. This information is used to select an interface on 
which Join/Prune messages are sent, and where the multicast packets originated 
at the root are expected to arrive. Based on received Join/Prune messages, 
routers maintain a set of mappings between the input interface and the output 
interfaces for each known multicast group. 

In case of unicast routing change, all multicast routing entries are reexam- 
ined using the RPF mechanism in order to determine the (possibly) new input 
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interface. This process of reestablishing the multicast tree we call tree recovery. 
If the new input interface differs from the old one, the multicast routing entry is 
updated: the new input interface is set instead of the old one and the new input 
interface is removed from the output interface list, if it was in it. Finally, control 
messages are sent to the neighboring routers: Join at the new input interface 
and Prune at the old input interface, if it is operational. In the transient phase, 
from the unicast routing change to the stabilization of the new multicast tree, 
packet loss may occur. 

PIM-SM has received substantial attention in the research community 0IHI 
Also, significant research has been done on application-level error recovery for 
real-time IP multicast 0 and reliable multicast applications m- However, there 
has been less attention on the multicast tree recovery at the network level. Wang 
et al. HH focussed on the performance of fault recovery in PIM Dense Mode 
running over OSPF. In addition, they analyzed the qualitative aspect of fault 
recovery of PIM running over OSPF. Our work extends these results and focuses 
on the performance of PIM-SM recovery. 

2 Problem Statement 

The performance of PIM-SM tree recovery is influenced by a range of factors, 
including the network topology properties (e.g. average node degree, link delay), 
multicast session properties (e.g. group size and data flow properties) and rout- 
ing mechanisms (e.g. unicast routing protocol and multicast recovery initiation 
method) . 

In this paper we explore the effect of these parameters on the performance of 
PIM-SM recovery. In particular, the multicast recovery initiation can be based on 
periodic polling of the unicast routing tables {periodic recovery), or on receiving 
of an explicit change notification from the unicast routing process {triggered 
recovery). The periodic recovery is more common in practice, since it does not 
assume that the unicast routing is aware of the multicast routing. In our work 
we analyze performance and cost aspects of both mechanisms. 

The unicast routing changes are caused by events belonging to three broad 
classes: Topology Reduction, e.g. link failure, removal or node failure. Topol- 
ogy Enrichment, e.g. link recovery or adding a new link and Dynamic Routing 
Change, e.g. link metric change. If topology reduction has occurred, the packet 
loss is often inevitable, since it takes time to reconstruct the multicast tree using 
alternative links. Intuitively, events belonging to the other two classes, called be- 
nign events in the rest of this paper, should not cause any packet loss. However, 
the standard PIM-SM recovery procedure implies that, in the case of a changed 
input interface, the old input interface is immediately disabled. In other words, 
events such as enrichment of the network by a new, operational link can also 
cause multicast packet loss. In this paper we evaluate the PIM-SM recovery per- 
formance both in the case of topology reduction (link failure) and a benign event 
(link recovery). 
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3 Performance Evaluation 

We have developed a simulation model of PIM-SM m using the Network Sim- 
ulator (NS) framework II dl . The model provides a general implementation of 
PIM-SM (routing based on explicit Join/Prune protocol messages, soft state 
with periodic refresh etc.) and a detailed implementation of the PIM-SM recov- 
ery |H| . The model is parameterized through a range of parameters including the 
average node degree, link delay, group density, CBR source rate etc. The unicast 
routing is based on the NS’s standard distributed implementation of the Dis- 
tance Vector protocol. We use random network topologies constructed to reflect 
real transport networks jnmsi. 

In each simulation instance, after the multicast distribution tree has stabi- 
lized and the source has started to send data, a randomly chosen link within the 
multicast tree is taken down. This event we call “link-down” event. After the 
multicast tree has recovered, the link is reintroduced in the network (“link-up” 
event). We measure the packet loss in receivers caused by these events. 

To evaluate the effect of the different parameters on PIM-SM recovery perfor- 
mance, we conduct a set of simulations where the parameters are varied within 
anticipated real network values. The following parameter ranges are chosen: re- 
covery mechanism (periodic p=20ms, periodic p=50ms or triggered) average 
node degree (D={2.5, 3.0, ..., 5.0}) and group density (5, 10, 15, 20 receiver 
nodes out of 30 in the network). The average link delay in all test networks is 
3ms, bandwidth lOMb/s, CBR rate is 500packets/second and the packet length 
is 320Bytes. 

3.1 Performance Evaluation Basis 

In this subsection we first analyze unicast recovery. We And the average packet 
loss in a unicast data flow when a link goes down, under the same conditions as in 
the forthcoming multicast study. We will use these results as a comparison for the 
multicast recovery performance. Furthermore, we present how many multicast 
receivers are affected by the tree recovery and how often the packet loss occurs. 
These data are significant for a proper evaluation of the multicast packet loss 
figures presented later in this section. 

We believe that it is only of interest to consider simulation instances where 
tree recovery after link-down is possible. Hence, we are not considering simulation 
instances where the link-down event resulted in disconnected topology. 



Unicast Loss. Our study is performed in networks using a unicast routing 
protocol based on the Distance Vector (DV) algorithm. When DV is used, each 
node has sufficient information to immediately repair the failed route if and only 
if the alternative route is two hops long. If three or more hops are necessary, the 
upstream node will discard the packets while the routing updates are exchanged 
and the routing state converges. This period will be longer in sparse networks 
due to longer alternative routes. 
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If a Link State (LS) protocol implementation is used, the unicast packet 
loss is not dependent on the average node degree. The unicast routing update 
flooding starts as soon as the link failure is detected. After receiving the update, 
each node can calculate the alternative route instantly. Therefore the packet loss 
will occur mainly due to the loss of packets traversing the faulty link, and it will 
be lower than the corresponding one for a DV routing protocol. 

We have measured the DV unicast recovery performance as the packet loss in 
a unicast flow with same properties as the tested multicast flow (lOMb/s CBR 
flow, 500packets/second, 320Bytes packets). The unicast flow traverses the same 
faulty link as the multicast flow presented in the remainder of this section. 



Distance Vector Link State 




Fig. 1. Unicast packet loss, depending on the average node degree. The expected on- 
link loss is 1.5 packets because NS excludes the link transmission time in its loss model 



Figured (left) shows the Distance Vector unicast loss depending on average 
node degree. Our results provide a good illustration of the quick recovery in 
highly connected networks. In 30-nodes networks with the average node degree 
of 5, the probability of having a two-hop alternative path for a link failure is 
very high. Hence, the DV packet loss for degree 5 is expected to be just above 
the minimum, estimated loss of the packets traversing the link: 



Lmin = R-{dp + d^) = 500 s"^ • 0.003256 s = 1.628 (1) 

where R is the packet rate, dp is the 3 ms link propagation delay and is the 
0.256 ms packet transmission time. In our simulation environment the minimum 
loss is even lower than the estimated minimum since the NS loss model 
implementation excludes the packet transmission time. 

As expected, the LS packet loss is independent of the average node degree 
(Fig. d right). 
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Affected Receivers. A link failure within the multicast tree will always affect 
at least one receiver. It is important to present how many receivers are affected 
in order to gain a complete view of the recovery performance. 

The multicast trees are higher (more hops on average from the source to each 
of the receivers) in the networks with low connectivity than in the networks with 
high connectivity. Therefore, a link failure is more likely to affect a receiver in a 
network with low connectivity than in a network with high connectivity. 

Figure H (left) shows the average number of receivers affected by a single 
failure. For example, 33% receivers are affected in networks with the average 
node degree D=2.5 and with group size 5, and only 10% in D=5.0 networks 
with 20 receivers. 

Sources for Packet Loss. Packet loss may occur due to both link-down and 
link-up events. The link-down event causes loss in 95% cases in the triggered 
recovery and almost always in the periodic recovery, regardless of the network 
parameters. The link-up event, however, causes packet loss only if the input 
interface has changed, which is more probable in low connectivity networks. This 
is the case since the alternative paths in the high connectivity networks will in 
general be shorter and “closer” to the original path — achievable through change 
of output interfaces in the transit nodes only. 

The link-up event causes loss from ~50% cases in D=5 networks to 80% cases 
in D=2.5 networks (Fig. |21 right). 




Fig. 2. Consequences of tree recovery: mean number affected receivers (left) and events 
causing the packet loss out of 1000 simulation instances (right) 
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3.2 Link-Down Event 

When a branch is removed from the multicast tree, the downstream nodes will 
be cut off until an alternative route is established. The total cutoff time T is 
bounded: 

2m < T <T\x+p + Tni 

where is the time it takes to recover the unicast routing, p is the unicast 
routing check period (20ms and 50ms in our simulations, zero for the triggered 
recovery) and Tm is the multicast routing recovery time (upstream propagation 
of the Join-messages to the closest node in the tree). 

Tu is shorter for larger average node degrees. Tm decreases as the probability 
of a nearby in-tree node increases, influenced by the number of receivers and the 
average node degree. 

We expect T to perform much better on average than the worst case. First, 
the expected time before the multicast recovery starts is p/2. Also, the uni- 
cast recovery time is often included in this period. Furthermore, the multicast 
recovery may start before the unicast is completely recovered in the triggered re- 
covery. This happens because the unicast routing recovery takes several routing 
message exchanges to stabilize, and that the multicast recovery succeeds quickly 
since the neighboring node may be a member of the same multicast group. In 
the process of unicast routing, the multicast input interface may temporarily 
point in wrong direction. This has no effect on the final multicast routing entry, 
since it is always coherent with the unicast routing. 

Our performance evaluation results are shown in Fig. 0 For the same recovery 
mechanism and number of receivers, each sextuple of adjacent bars represents the 
six average network degrees (2.5 to 5) we have tested. The standard deviation in 
these measurements ranges from ~2.5 packets for the triggered recovery with 20 
group members to ^7.5 packets for the periodic recovery with 5 group members. 

The unicast loss pattern (Fig. G1 left) is recognizable in the charts for low 
group sizes. For higher group sizes, the multicast recovery often suceeds before 
the unicast is completely recovered due to the high probability of the neighbor 
node beeing a member of the multicast group, thereby obscuring the unicast loss 
pattern. 

The effect of the node degree and the group size is shown in the characteristic 
pattern where the performance increases by ~3 packets from D=2.5 networks 
with 5 receivers to D=5.0 networks with 20 receivers, for both periodic and 
triggered recovery. 

We can observe that the loss performance is dominated by the unicast routing 
check period p: the mean loss value for the triggered, periodic p=20ms and 
periodic p=50ms recovery is 4.5, 8.8 and 15.8 packets, respectively. The difference 
between the first two is 4.3 packets. The expected time between the link down 
event and the recovery procedure initiation is p/2=10ms or 5 packets. The 0.7 
packet difference is caused by the overlap between the unicast and multicast 
recovery. 
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Fig. 3. Mean packet loss per affected receiver, link-down event. Each sextuple of adja- 
cent bars represents the six average node degrees: 2.5 (leftmost) to 5 (rightmost) links 
per node, for group sizes 5, 10, 15 and 20 receivers 
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Fig. 4. Mean packet loss per affected receiver, link-up event. Each sextuple of adjacent 
bars represents the six average node degrees: 2.5 (leftmost) to 5 (rightmost) links per 
node, for group sizes 5, 10, 15 and 20 receivers 
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3.3 Link-Up Event 

When a network link recovers, the unicast routing tables are updated for routers 
that have the link in a shortest path route. In our scenario, the unicast routing 
tables become the same as before the link-down event. The multicast routing 
process notices this benign event, and starts the recovery procedure in order to 
reestablish the better multicast tree. 

The PIM-SM recovery procedure implies that the old input interface on a 
router is closed instantaneously when the new input interface is chosen. It takes 
time for the multicast flow to propagate over the new branch. The packet loss 
in this case is dependent on the branch propagation delay, which has fewer hops 
and a shorter delay in networks with high node degree. 

The mean packet loss caused by the link-up event is shown in Fig. 0 The 
packet loss is largely independent of the recovery period, since the old input in- 
terfaces are operational and unchanged even though the unicast routing changes 
in this period. 



4 Overhead Comparison 

PIM-SM tree recovery includes the multicast routing table recalculation and the 
exchange of Join/Prune control messages on the new links. These actions will 
respectively cause additional router CPU consumption and the network load 
increase. We provide an estimate of how often the overhead is incurred for the 
the two recovery types (periodic/triggered). 



Computational Overhead. Each time the unicast routing state has changed, 
an RPF check has to be done for each multicast routing entry. The triggered 
recovery will be invoked only when the changes have occurred, justifying the 
routing table processing. If Distance Vector unicast routing is used, the procedure 
can be invoked repeatedly as the unicast routing stabilizes. This will additionally 
stress the system in the transient phase. 

The periodic recovery needs to know if there has been any changes in unicast 
routing tables since the last invocation of the recovery procedure. If this is im- 
possible, because e.g. the unicast routing is unaware of the coexisting multicast 
routing and does not release the last update information, the only implementable 
solution is the least effective, periodic recovery with unicast routing table pro- 
cessing in each invocation. 



Communication Overhead. In PIM-SM recovery the communication over- 
head consists of two parts, the transmission of packets that are rejected due 
to wrong input interface and the Join/Prune messages triggered by multicast 
routing changes. The PIM-SM standard specifies sending the prune message on 
the old input interface and merging Join/Prune messages for many multicast 
groups, thereby minimizing the communication overhead. 
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The periodic recovery will send at most one Join/Prune message, and only 
if the input interface has changed. The triggered recovery may send several 
Join/Prune messages, as the DV routing stabilizes. 



Repeated Recovery Invocations in Triggered Recovery. To understand 
the amount of the additional overhead in transient period in the triggered re- 
covery, we have counted the number of calls and the number of input interface 
changes in our simulator. We have tested D=3.0 topologies with a single, five- 
member group. 

The link-down event caused an average of 1.75 recovery procedure invocations 
per multicast node. A maximum of 13 invocations was registered, however, 4 
or fewer invocations were registered in more than 95% cases. A maximum of 
5 Join/Prune messages per node was sent. In 75% cases the input interface 
remained the same (zero Join/Prune messages sent), and in additional 20% cases 
a single Join/Prune was sent. 

This result shows that, in our DV simulation environment, the triggered 
recovery induced a 75% higher computational overhead and a slightly higher 
control message overhead, as compared to the periodic recovery. 

Link State Routing. A Link State unicast routing protocol will receive at 
most one unicast routing update for each event (e.g. single link removal). This 
implies that at most one multicast recovery procedure invocation will occur, even 
in the triggered recovery. In other words, in LS-based networks, the triggered 
and the periodic recovery will have similar computational and communication 
overhead. 



5 Conclusion 

We have evaluated the PIM-SM recovery performance depending on the recovery 
mechanism and various topology and session parameters. Packet loss occurs due 
to both reductive and benign events. We simulated a reductive event as a link 
failure (link-down event) and a benign event as the link recovery (link-up event) . 

The link-down event causes packet loss in at least 95% cases in our test 
environment, regardless of the other parameter settings. The triggered recovery 
has superior performance as compared to the periodic recovery. The triggered 
recovery will in general have computational and communication overhead of the 
same order as the periodic recovery, but may not be implementable on some 
systems. Other factors (e.g. average node degree) have a moderate effect on the 
performance. In general, PIM-SM recovers quickly, showing performance close 
to the underlying unicast recovery. 

The packet loss caused by the link-up event is unnecessary high, and can 
be decreased using an improved recovery algorithm. Detailed specification and 
analyze of this algorithm is the topic of our current research. 
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Abstract. We consider the problem of how to achieve a simultaneous 
arrival of information at a multitude of recipients for applications where 
the receivers are non-cooperative. For that reason, we aim at designing 
an inter-receiver delay jitter fair service for Internet multicast delivery. 
In contrast to related work, we present an approach at application layer, 
which does not assume special properties of the core network nodes and 
can be partially deployed. Additionally, it implicitly takes current net- 
work load into account, which gives the opportunity to keep the overall 
message delivery delay low. Simulation results show the feasibility of the 
described algorithms and a significant reduction of inter-receiver delay 
jitter compared to normal multicast delivery 



1 Introduction 

Multicasting allows an efficient transmission of a message to a group of receivers. 
A multitude of new application areas are based on this transmission technique, 
e.g. news and software distribution, distributed computing and multimedia ap- 
plications like videoconferencing and teleteaching. Combined with current mul- 
ticast transport protocols, multicasting provides a suitable distribution platform 
for information. However, as information becomes more and more valuable and 
therefore instant reaction to information is necessary, subscribers to information 
services would like to be assured that the provided information is not delayed 
with respect to other recipients. For example, assume a system that provides 
sensitive stock quote information. The sooner a recipient gets this information 
the more money he is likely to earn. Especially if the receivers have to pay for this 
channel they do not tolerate delay unfairness. Hence, a service is required that 
achieves fairness regarding the delay jitter among receivers, which is not given in 
the current Internet. We will call this property delay-fairness for short. Besides 
information services, other applications like electronic markets and distributed 
games would benefit from a delay-fair service, too. 

The approach presented in this paper improves delay fairness by using servers 
in the network to smooth message reception time differences at multicast re- 
ceivers. Our approach is based on a realistic system model of the current Internet. 
Though it is not possible without reservation of bandwidth and without router 
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support to guarantee absolute delay fairness, our simulation results show a sig- 
nificant decrease of inter-receiver delay jitter. As our approach is completely at 
application layer and therefore needs no router modifications it can be deployed 
partially and in short time. 

The paper is structured as follows. First, we define the problem and discuss 
related work. In Section0we present our server-based approach to improve delay 
fairness. Simulation results are presented in subsection Id. 41 Finally, section g] 
concludes the paper with a brief summary. 

2 Problem Definition and Related Work 

We assume a sender s and a set of receivers R = {ri, ..., r^, ..., r„} of a multicast 
group. The sender sends messages toi, ..., m^, ... to that multicast group. The 
delay that is experienced by message i on its way from Sender s to Receiver rj 
is denoted by The inter-receiver delay jitter of message i is the difference 
between the first reception of message i at a receiver and the last reception of 
the same message at another receiver: 



Ji = max{\/rj € R : Dij} — min{irj € R : Dij} 

We define the delivery delay for message i as 

Di = max{\/rj € R : Dij} 

These definitions are illustrated in Figure H We aim at minimizing the inter- 
receiver delay jitter while keeping the delivery delay in reasonable bounds. 

Whereas we focus in this paper on an approach at application layer, related 
work concentrates on network aspects. The consideration of inter-destination 
delay jitter during the multicast routing tree set up is in the main interest 
|1 1 1511 . An approach to compensate the jitter caused by the network is 

presented in The algorithm is based on a delay-fair multicast tree and ad- 
ditionally assumes that the packet scheduling delay is bound and known. The 
nodes are modelled as a regulator followed by a packet scheduler for each outgo- 
ing link. The regulator delays each packet until its eligible time to be scheduled 
is reached. The calculations of the algorithm are based on worst-case assump- 
tions of the packet scheduler delay bounds. The current network load is not 
considered. 

Our work differs from related work in the following way. We propose an 
approach that works at application layer and is therefore deployable in short 
time. Our main goal is to present a scalable approach which fits well to the 
current Internet structure without the need for router changes. 



3 A Server Based Architecture 

for Inter-receiver Delay Jitter Smoothing 

The applications that can benefit from an inter-receiver delay fair service differ 
widely in their data rate requirements. At the moment we focus on applications 
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Fig. 1. Multicast message reception time and inter-receiver delay jitter 



that transmit data at a certain minimum rate like stock quotes for example. 
Though our approach is extensible for lower data rates (see section . We pro- 
pose a solution to the problem at application layer, which leaves the approach 
independent of the core network nodes. The nature of the problem will not allow 
that we can trust the receivers to cooperate. Hence, the receivers are not subject 
to any condition. However, we assume a core network where it is safe to place 
special hardware that can be trusted. 



3.1 Architecture 

The architecture of the approach is shown in Figure 0 In the trusted region 
of the network we place a set of servers. These servers are used to smooth the 
inter-receiver delay jitter. To accomplish this, the data messages are directed via 
these servers to be synchronized in terms of delivery delay on their way to the 
receivers. The servers have to be placed near the receivers in order to be most 
efficient. A reliable multicast protocol can be used to transmit the messages to 
the servers. From the servers, the information is delivered to the receivers at a 
time specified by the information source. There are further efforts to be made, 
which we will only briefly mention here. First, the receivers have to select a server 
that is as close as possible in terms of delay. An expanding ring search (ERS) | 2 | 
or the Token Repository Service m will provide this service. Second, messages 
should be encrypted to provide secrecy between sender and servers in order to 
avoid that receivers get hold of the information directly from the sender. Moyer 
et al. give an overview of frameworks providing security P]. Basic mechanisms 
can be used because a server group membership change will be a rare event. 
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Fig. 2. Placing servers in the trusted region 



The synchronisation of the data messages at the servers will result in an 
inter-receiver delay jitter smoothing. However, as we assume a packet data net- 
work without synchronous message delivery there is no way to tell all servers 
at the same time that they can instantly reveal the information. Besides, this is 
actually the problem we are about to solve. This means that the sender has to 
predict the time at which the information will have been arrived at all servers 
and to send this predicted time along with the message. We will call this time 
information revealing time. The clocks of the information source and the servers 
are assumed to be synchronized. For example the Network Time Protocol NTP 
0 achieves synchronisation accuracies in the order of tens of milliseconds over 
Internet paths. The server based inter-receiver delay jitter smoothing is depicted 
in Figure 01 Basically our approach works in the following way: 

1. Determine an information revealing time at the sender. This can be done by 
estimation, configuration or by a test message that contains no application 
data. 

2. The sender transmits the data message including the revealing time with 
multicast to the set of servers. 

3. The servers receive the data and deliver it at the revealing time to the con- 
nected receivers. This can be achieved by unicast or by a server specific 
multicast group. The server provides feedback information about the suit- 
ability of the information revealing time to the sender (see section 13.31 for 
details) . 

4. The receivers have to deliver the incoming data from the servers instantly 
to the application. 

5. The sender collects the feedback information from the servers. 
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6. The sender determines a new revealing time based on the received feedback. 

7. Proceed with step 0 
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Fig. 3. The servers forward the messages at the information revealing time specified 
by the sender to smooth the inter-receiver delay jitter 



From the servers the data is delivered with multicast or unicast to the re- 
ceivers depending on their number attached to one server. Note that the delay 
between the servers and the receivers cannot be taken into account when deliver- 
ing the message. The reason is that it is not possible for the servers do determine 
the delay to the untrusted receivers in a secure way. For example, if a server tries 
to measure the delay by sending ping requests to the receivers, the answer can 
be deliberately delayed, which would result in an earlier delivery of the messages 
to that receiver. 

The following sections will describe the feedback mechanism and the algo- 
rithm to determine the revealing time in more detail. 

3.2 Feedback Protocol 

Servers have to provide feedback to the sender to allow a dynamic reaction to 
changing network conditions. A simple approach would be to send the feedback 
with unicast. However, we have learned from reliable multicast implementations 
that such an approach may result in a large number of messages, which leads to 
an overwhelming of the sender and therefore limits scalability 0 . Our approach 
is based on the idea of suppressing the feedback information, which is also used 
in the more scalable reliable multicast protocols. 
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Alternative 1: Revealing Time Expiration Notification. The servers send a mul- 
ticast feedback message if the information revealing time has already expired at 
data message reception time. The feedback message contains only the message 
sequence number. This tells the sender to increase the revealing time without 
transmitting an exact value. To accomplish that not all servers answer at the 
same time, the feedback send time is chosen randomly. The feedback is sup- 
pressed either if one of the servers has answered to the same message sequence 
number already or if not at least a message round trip time has passed since the 
last sent feedback information. The feedback tells the source that the deadline 
was too short for at least one server and therefore should be extended. Since the 
feedback is only gathered from the servers where the data has arrived too late, 
there is no feedback information (and thus less message overhead) necessary in 
the case where all servers got the data in time. Therefore, an algorithm like the 
multiplicative increase, additive decrease algorithm (see Section |^|) is necessary 
to find a suitable deadline value. 

Alternative 2: Desired Revealing Time Notification. A further option is to send 
the feedback with the message sequence number and the desired future delivery 
delay included. The desired delivery delay is estimated with regard to the mean 
and variation of previous received messages Dij using formulas established by 
Jacobson ^ for TCP round trip time calculation. A server can request an ex- 
tended or reduced information revealing time. Though, the sender is interested 
only in the highest information revealing time. This requires another suppres- 
sion mechanism. Again the feedback is sent to the multicast group to which all 
servers belong. A server sends feedback scheduled at a random time provided 
that no feedback has been received with a higher or equal information revealing 
time for the same message sequence number. The delay Di can be determined 
with the filter algorithm discussed in section 13.31 

There are further design options for the feedback protocol that we will not 
describe here in more detail since they are based on more extensive requirements. 
Besides sending the feedback with multicast to the sender and the other servers, 
the feedback could be gathered by a tree-based mechanism. The advantage would 
be to prevent other servers from receiving feedback information. Such a gathering 
tree could for example be provided by a Concast service P). 

3.3 Information Revealing Time Calculation 

As mentioned, the information revealing time has to be predicted. As the inter- 
receiver delay jitter is changing over time, a dynamic mechanism is necessary 
to adapt to network load changes. In this subsection we give for each feedback 
protocol alternative an example of calculating the information revealing time. 

Filter Mechanism. A suitable way to predict the information revealing time is to 
use feedback information of the message delay (preferred information revealing 
time), which can be provided by the servers. Using that feedback a new value 
for the deadline is calculated via a filter operation of the form 
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dt = a* dt-i + (1 - a) * dfeedb, 0 < a < 1 

where dt is the predicted delay value, dt-i is the predicted delay value of the 
previous interval, d feedb is the delay value feedback of the previous interval and a 
is a parameter which adjusts the influence of the measured value to the predicted 
delay. We transmit the predicted information revealing time along with the data. 
In this case feedback has to be gathered both if the deadline was sufficiently large 
and if the deadline was not sufficient. 

Increase/Decrease Mechanism. Another option is to predict the information 
revealing time via an algorithm that multiplicatively increases the deadline if 
feedback information has been received and additively decreases the deadline if 
no feedback has been received: 



dt = Cl* d(_i,ci > 0 



dt = dt-i - C2, C2 > 0 

where c\ and C 2 are constants. Then, no feedback information is necessary in 
the case where all receivers got the data in time. 

3.4 Simulation Results 



Information Revealing Time (IRT) and Inter-receiver Delay Jitter (J) 




Fig. 4. Revealing time expiration notification protocol and increase/decrease algorithm 



We have simulated the multicast distribution of messages sent from an infor- 
mation source to a set of servers to see whether a reduction of the inter-receiver 
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delay jitter can be achieved and to study the proposed feedback protocols. In 
the experiments we have analysed to which degree the algorithms minimize the 
inter-receiver delay jitter and which costs are involved. The costs include the 
increase of the delay and the message overhead introduced by the algorithm. 

The simulations were realized with the Network Simulator NS Q. The topol- 
ogy generator GT-ITM m was used to generate transit stub networks. To ac- 
count for the jitter due to packet loss we used the multicast transport protocol 
SRM, though our approach does not depend on a special transport protocol. The 
simulations do not consider the influence of unsynchronised clocks of the sender 
and the servers nor the message overhead introduced due to message encryption. 
We have varied the number of nodes between 10 and 1000 nodes and the sending 
rate between 1 and 128 kbit/s. 



Information Revealing Time (IRT) and Inter-receiver Delay Jitter (J) 




Message send time 



Fig. 5. Desired revealing time notification protocol and filter algorithm 



Typical results are shown in Figures 0 and 0 for both information revealing 
time calculation algorithms. The (dotted) bars in the graph indicate the time 
span between the first arrival of message t at a server and the last arrival of 
message i at another server, i.e. the inter-receiver delay jitter at the servers. 
The information revealing time for each message is shown by the solid line. The 
figures show the results for a simulation of a 200-node network. One node of 
the network is designated the information source and 70 nodes of the network 
are servers. To simulate network load 100 web clients request HTTP traffic at 
10 web servers. Additionally, 50 nodes are configured as FTP servers. The FTP 
traffic is generated between 10 and 70 s simulation time. The data rate of the 
information source is 32 kBit/s CBR data stream. 
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Fig. 6. Feedback overhead for in- 
crease/decrease algorithm 



Fig. 7. Feedback overhead for filter al- 
gorithm 



Whereas the filter algorithm can adapt faster to changing jitter, the in- 
crease/decrease algorithm causes less message overhead. With the more conser- 
vative increase/decrease algorithm a lower inter-receiver delay jitter is achieved, 
though, the overall delay is higher. The message overhead of incoming feedbacks 
at the information source based on the number of outgoing data messages is 
about 400 percent with the Desired Revealing Time Notification and about 10 
percent with the Revealing Time Expiration Notification feedback protocol. 

4 Conclusion 

In this paper, we examined the multicast fairness problem concerning inter- 
destination delay jitter. We proposed an approach at application layer. The pro- 
posed algorithms make fewer assumptions about the core network nodes and are 
therefore easier to deploy in the Internet compared to previous work. Further- 
more, these algorithms are able to consider current network load, which improves 
overall message delivery delay. Due to the IP multicast model they need secu- 
rity mechanisms, though. The simulations of the approach showed a significant 
reduction of inter-receiver delay jitter and a dynamic reaction to network load 
changes. 

At the moment the reactions of the algorithms are coupled with the data 
rate of the information source. We intend to decouple the algorithms to allow a 
broader range of applications to use the service. For low data rate applications, 
test messages can trigger feedback from the servers. Although we intend to de- 
velop a transport protocol independent approach, we will examine how efficient 
a tighter coupling with an appropriate transport protocol would be. For exam- 
ple, the transport protocol control messages could be used to transmit feedback 
information to optimize bandwidth consumption. 
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Abstract. This paper addresses multicast routing in multi-hop optical 
networks employing wavelength-division multiplexing (WDM). We con- 
sider a model in which multicast communication requests are made and 
released dynamically over time. A multicast connection is realized by con- 
structing a multicast tree which distributes the message from the source 
node to all destination nodes such that the wavelengths used on each 
link and the receivers and transmitters used at each node are not used 
by existing circuits. We show that although the routing and wavelength 
assignment in this model is NP-complete, the wavelength assignment 
problem can be solved in linear time. 



1 Introduction 

Wavelength-division multiplexing (WDM) is emerging as a key technology in 
communication networks. In WDM networks the fiber bandwidth is partitioned 
into multiple data channels which may be transmitted simultaneously on dif- 
ferent wavelengths. Thus, WDM permits use of enormous fiber bandwidth by 
providing data channels whose individual bandwidths more closely match those 
of the electronic devices at their endpoints. 

WDM networks can be classified as either single-hop or multi-hop networks 
m In single-hop (or all-optical) networks each message is transmitted from the 
source to the destination without any optical-to-electronic conversion within the 
network. Single-hop communication can be realized by using a single wavelength 
to establish a connection, but such connections may in general be difficult or im- 
possible to find. Alternatively, all-optical wavelength converters may be used to 
convert from one wavelength to another within the network but such converters 
are likely to be prohibitively expensive for most applications in the foreseeable 
future |0|. 

* This work was supported by the National Science Foundation under grant CCR- 
9900491 to Harvey Mudd College and grant MIP 96-33729 to the University of 
Pittsburgh. The authors also gratefully acknowledge the assistance of Mr. Adam 
Fineman who implemented the algorithms described here and performed the com- 
putational experiments. 
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In multi-hop communication networks a message entering an intermediate 
node on a particular wavelength can be converted into the electronic medium 
by a receiver and retransmitted on a new wavelength by a transmitter. Each 
conversion of the message from one wavelength to another is called a hop. Multi- 
hop networks have been shown to enjoy higher utilization of bandwidth and 
lower probability of blocking than single-hop networks Q. However, a multi-hop 
connection may use more transmitters and receivers than a single-hop connection 
and, depending on the network architecture, each hop can contribute significantly 
to the communication latency. Therefore, it is generally desirable to find multi- 
hop connections that minimize the number of transmitters and receivers and/or 
number of hops used. 

Finally, a network may support unicast (or one-to-one) communication as 
well as multicast (or one-to-many) communication. Multicast communication 
is used in distributed shared memory clusters to support operations such as 
cache invalidation and is used in wide-area networks for video distribution and 
teleconferencing among other applications. 

In this paper we consider multicast communication in multi-hop circuit- 
switched networks. We assume networks with an arbitrary number of nodes, 
a fixed number of transmitters and receivers at each node, and a fixed number 
of wavelengths on each link. Multicast communication requests are made and 
released over time. A multicast connection may be realized by constructing a 
multicast tree which distributes the message from the source node to all desti- 
nation nodes such that the wavelengths used on each link and the receivers and 
transmitters used at each node are not used by existing circuits. 

The routing and wavelength assignment (RWA) problem is that of selecting a 
multicast tree, the wavelengths on the links in the tree, and thus the intermediate 
nodes that will perform wavelength conversion. In the wavelength assignment 
(WA) problem a multicast tree is given and the problem is that of selecting the 
wavelengths on the links in the tree and the intermediate nodes for wavelength 
conversion. 

In this paper we show that although the RWA problem in this model is, in 
general, NP-complete, the WA problem can be solved in linear time. In addition, 
we show that the linear time WA algorithm can be extended to find “optimal” so- 
lutions under various definitions of optimality such as minimizing the maximum 
number of hops. 

Various aspects of multicasting in WDM networks have been investigated 
recently for both packet- and circuit-switched networks . In work most 

closely related to the results described here, Kovacevic and Acampora 0 have 
investigated the WA problem for multi-hop unicast routing in circuit-switched 
meshes and Sahasrabuddhe and Mukherjee 0 have formulated the RWA problem 
for multi-hop multicast routing in packet-switched networks as a mixed-integer 
linear programming problem. 

The remainder of this paper is organized as follows. In Section El we formally 
describe the model under consideration and define notation. In Section 0 we give 
a linear time algorithm for the wavelength assignment problem and generalize 
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the algorithm to find “optimal” multicasts. Section El describes experimental 
results using these algorithms. Conclusions are given in Section Eg 

2 Model and Notation 

We represent an interconnection network by a connected directed graph G = 
(V,E) where the vertices represent switches and the directed edges represent 
links between pairs of switches. Each switch may be connected to a node or 
network access station. Except where the distinction is necessary, we henceforth 
use the terms “switch”, “node”, and “vertex” interchangeably and let n denote 
\V\. Similarly, we use “link” and “edge” interchangeably. Each link can carry 
some number, w, of different wavelengths denoted hy A = {Ai, . . . , Au,}. Each 
node V has T{v) tunable transmitters and R(v) tunable receivers, each of which 
can tune to any of the w wavelengths. Let '^out(^) denote the number 

of incoming and outgoing links, respectively, at node v. We assume that the 
number of nodes n in the network is variable but that parameters w, T{v), R{v), 
djjj(u), and dout(''^) &re bounded by constants dictated by the technology. 

A wavelength on an input link may be routed to the same wavelength on any 
number of output links and, optionally, to a receiver at the local node. Similarly, 
a message transmitted on a particular wavelength by a transmitter at a node may 
be routed on this wavelength to any number of output links. Routing must satisfy 
the constraint that two messages using the same wavelength cannot share the 
same link. A switch model with these properties is shown in Figure E Switches 
with some similar characteristics were described by Kovacevic and Acampora 0 
and by Sahasrabuddhe and Mukherjee jO]. We note that the results described in 
this paper can be adapted to a number of other switch models. 

A multicast communication request is an ordered pair (s, D) where s G E is 
the source of the multicast and D GV — sis the set of destination nodes. We as- 
sume that multicast communication requests are made and released dynamically. 
At the time that a particular multicast communication request is made there may 
be some limits imposed on the routing resources available in the network. Specif- 
ically, each node v has some available number t{v) of transmitters and r(u) of 
receivers that can be used to implement the multicast where t{v) < T{v) and 
r{v) < R{v). In addition each link {v,x) has some set w{v,x) C A of available 
wavelengths. Let W{v) denote the total number of distinct wavelengths available 
on all outgoing links from node v. These resource limits may reflect the actual 
available resources, due to utilization of resources by existing connections, or 
these limits may be imposed in order to reduce cost or to leave resources avail- 
able for subsequent connection requests. 

Due to these resource constraints, it may not be possible to realize a multicast 
communication request. Moreover, in some cases it may not be possible to realize 
a request when only one wavelength may carry the message on each link, while 
the connection may be realizable when multiple wavelengths are permitted to 
carry the message on the same link. Let I denote the maximum number of 
wavelengths that may be used to transmit the same message over any single 
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Fig. 1. Schematic of switch model. 



link. Figure 0 illustrates an example of a network in which node s is the source 
of the multicast and all the remaining nodes are destinations. In this example, 
node s has two transmitters while all remaining nodes have zero transmitters. 
When £ = 1, node s may use only a single wavelength on each link. Since node 
u has no transmitters, it is not possible for the message to be delivered to both 
destinations w and x. On the other hand, when £ = 2 node s may transmit on 
both wavelengths Ai and A 2 over each link. In this case, all destination nodes 
can be reached. 

We now formalize the definitions of the RWA and WA problems. 

Definition 1. Let G = (V, E) be a direeted graph and (s, D) a multieast eommu- 
nication request in this graph. A routing and wavelength assignment (RWA) is a 
collection of links, wavelengths on these links, and wavelength settings for trans- 
mitters and receivers at each node such that: each v € D receives the message 
from s, at most £ wavelengths from w{v, x) are used on each link (v, x) G E, and 
no more that t{v) transmitters and r(v) receivers are used at each node v G V. 



Definition 2. Let G = (V) E) be a directed graph, (s, D) a multicast communi- 
cation request in this graph, and t a subtree of G with root s and containing all 
vertices in D. A wavelength assignment (WA) with respect to t is a set of wave- 
lengths on the links in r and wavelength settings for transmitters and receivers 
at each node in r such that: each v G D receives the message from s, at most 
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S 




Fig. 2. Node s has two transmitters and all other nodes have no transmitters. 

£ available wavelengths from w(v,x) are used on eaeh link (v,x) in t, and no 
more that t{v) transmitters and r{v) receivers are used at each node v in t. 

The following two theorems show that although the RWA problem can be 
solved efficiently for some special cases, the RWA problem is, in general, NP- 
complete. The proofs of these results are omitted in the interest of space. 

Theorem 1. For any value of £> 1, if t{v) > W(v) and r{v) > 1 for all v G V 
then a RWA can be found, or it can be determined that none exists, in time 
0{n). 



Theorem 2. For any £ > 1, if t{v) < W{v) for some nodes v G V then the 
problem of determining if there exists a RWA is NP-complete. 



3 The Wavelength Assignment Problem 

In this section we show that the wavelength assignment problem can be solved 
in linear time. Throughout this section, the following assumptions are made: 

1. A fixed multicast tree is given with source node s at the root. All destination 
nodes are in the tree, although the tree may also contain non-destination 
nodes. 

2. All leaves in the multicast tree are destination nodes. (Otherwise leaf nodes 
can be repeatedly removed until this property is true.) 

3. For each destination node v in the multicast tree, r(v) > 0. (Otherwise no 
wavelength assignment exists.) 

We begin in Subsection Id. I I bv examining the case that £ = 1. In Subsection Id. 21 
we show how the algorithm can be adapted to find wavelength assignments that 
minimize the maximum number of hops from the source to all destinations. 
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3.1 Wavelength Assignment for t — \ 

The algorithm is based on dynamic programming. For each non-root node v, let 
p(v) denote the parent of v in the given multicast tree. Then (p(v),v) denotes 
the link from the parent of v to v. Define the predicate my(A) — >■ {true, false} 
by my{\) = true if and only if wavelength A is available on the link {p(v),v) 
and node v can deliver the message to all destinations in its subtree if it receives 
the message on wavelength A. Recall that every leaf is a destination node. Thus, 
from the above definition it follows that for each leaf v in the tree. 



my (A) 



true if A € w{p{v),v) 
false otherwise 



( 1 ) 



In other words, if z; is a leaf then my{\) is true if and only if wavelength A is 
available on the link from zi’s parent to v. 

Next, consider an internal non-root node v which has no receivers available. 
Since r(v) = 0, node v may forward the message on the incoming wavelength to 
its children but it may not receive the message and then retransmit it on other 
wavelengths. Let C{v) denote the set of children of v. Let /\ and \J denote the 
boolean “and” and “or” operators respectively. If r{y) = 0, 

^„(;,)=/A.ccw"^4A)ifAeu;(p(u),u) 

I false otherwise 

This rule asserts that v can deliver a message received on wavelength A to all 
destinations in its subtree if and only if A is available on the link entering v from 
its parent and each child x oi v can can deliver the message to all destinations 
in its subtree if x receives the message on wavelength A. 

Next, consider the case that r(v) > 0. In this case, node v can use wavelength 
A to deliver the message to its children and, in addition, node v can receive the 
message and retransmit the message to its children using up to t(v) wavelengths 
other than A. Define a wavelength selection set with respect to A to be a subset 
of A which contains A. Let Aa,c denote the set of all wavelength selection sets 
with respect to A of size at most c-l- 1. Thus, every set in Aa,c comprises A and 
up to c additional wavelengths. Then 

my{A) = V A V (3) 

X^C{v) 

if A G w{p{v),v) and otherwise my{\) = false. This rule asserts that my{\) 
is true if and only if wavelength A is available on the link entering v from its 
parent and there exists some wavelength selection set A comprising A and up to 
t{v) additional wavelengths (to be transmitted at v) with the following property: 
Every child x oi v can deliver the message to all of its descendant destinations 
if it receives the message on one of the wavelengths A' in set A. 

Finally, consider the case of the root node s. Unlike the other nodes in the 
tree, node s does not receive the message from a parent node. Instead, node s 
transmits the message using up to t{s) different wavelengths. Let By denote the 
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set of all subsets of A of size at most c. Define M = true if and only if a WA 
exists originating at the source node. Then, 

V A V (4) 

BeBt(a) xeC(s) A'eB 

This rule is analogous to the one in Equation (3) except that node s now trans- 
mits all wavelengths itself rather than receiving one on an incoming link. 

The dynamic programming algorithm is shown in Algorithm 1. Recall that 
given an acyclic directed graph with n vertices Vi, ... ,Vn, a topological ordering 
of the vertices is a permutation , . . . , Vi^ of the vertices such that if there is a 
directed edge from uq to Vi^ then j < k. Since the multicast tree is acyclic, there 
exists a topological ordering of the vertices. Note that by visiting the vertices in 
the order Vi^, . . . ,Vi^, a node is only visited if all of its descendants have been 
visited. 



Compute a topological ordering Uij , . . . , Vi„ of the n 
nodes in the multicast tree 
for j — n down to 1 
Let V = wq- 

for each wavelength A 

if u is a leaf node then compute m„(A) using 
Equation (1) 

if J > 1 and r(v) = 0 then compute m„(A) using 
Equation (2) 

if J > 1 and r(v) > 0 then compute m„(A) using 
Equation (3) 

if J = 1 then compute M using Equation (4) 
end for {Comment: End inner for loop) 
end for ( Comment: End outer for loop) 

return (M) 



Algorithm 1 



Note that the actual WA can be found, if one exists, by recording the wavelength 
assignments in addition to the values of m„(A) and M. 

We now derive an upper-bound on the running time of the algorithm. In 
general, computing a topological ordering takes time 0{n + m) where n = \V\ 
and m = \E\. Since our model assumes that the degree of each node is upper- 
bounded by a constant, m G 0{n) and thus the ordering can be computed in 
time 0{n). 

There are a total of wn iterations through the for loops. Among the compu- 
tations performed inside the for loops, the computation in Equation (3) requires 
the largest number of steps. An upper-bound on the number of steps required to 
compute my{\) in Equation (3) can be derived as follows: For each wavelength 
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A at most X)i=o distinct wavelength selection sets are considered because 

there are ways of choosing i wavelengths other than A from A. For each 

wavelength selection set A, consider the set of children, C{v), of node v. Set 
C{v) has size at most dout(f) and for each x £ C{v), at most t(v) + 1 steps 
are required to determine if there exists a wavelength & A such that 
is true. Therefore, in the worst case the number of steps required to compute 
to^(A) is bounded by Ei=d (’“7^)]dout('y)(i(i’) + l)- Letting t = max„gy t(u)-|-l, 
C = X)i=o (T)> ^'^d d = max„gv dout('*^)j tti® running time of the computations 
performed inside the for loops is upper-bounded by [wCdt]n. Thus, the algo- 
rithm has 0{n) running time, with the constant term depending on constants 
w, t, and d. The impact of these constants on the running time, in practice, is 
discussed in Section E] 



3.2 Optimal Multicast for £ — 1 

In this subsection we show that the dynamic programming solution described in 
the previous subsection can be adapted to find wavelength assignments which 
minimize the maximum number of hops required to reach all destination nodes. 
Similar adaptations can be made for other metrics of optimality. 

For each non-root node u, hy{\) is defined to be the minimum value k such 
that there exists a path from v to every destination node in the subtree rooted 
at V which uses at most k hops, assuming the message enters v on wavelength 
A. If wavelength A is not available on link (p{v),v) or it is not possible for v to 
reach all of the destination nodes in its subtree when the message enters v on 
wavelength A then define /it, (A) = oo. 

From the definition, it follows that for each leaf v in the tree, 

/j„(A) = |'^ (5) 

00 otherwise 



Next, consider an internal non-root node v. If r(y) = 0, node v cannot receive 
and retransmit the message but may only distribute the message to its children 
using wavelength A. Thus, if r{v) = 0, 



hy{X) 



vn&^x^C{v) hx{\) if A G w(p(v),v) 
oo otherwise 



( 6 ) 



If r(v) > 0, node v may distribute the message to its children on wavelength A 
without incurring an additional hop. In addition, node v may receive the message 
and retransmit it to its remaining children using up to t(y) wavelengths other 
than A. Each child which receives the message on a wavelength A' other than A 
incurs an additional hop. Thus, for r{v) > 0, 



hy{X) 



min max min 
^ec(v) x'eA 



hx(X') ifA' = A 
l + hxiX') if AV A 



(7) 



if A G w{p{v),v) and otherwise /i„(A) = oo. 
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Let H denote the minimum number of hops required. One hop is incurred 
by the initial transmission of the message at node s. Therefore, 

H = l+ min max min ha, (AO (8) 

Finally, in Algorithm 1, m„(A) and M are replaced by /i„(A) and iJ, respec- 
tively, and Equations (1), (2), (3), and (4) are replaced by Equations (5), (6), 
(7), and (8), respectively. The asymptotic running time and constants are easily 
verified to be the same as that of the original algorithm. 

3.3 Wavelength Assignment for i > 1 

As illustrated in the example in Figure El a WA may not exist when each link is 
permitted to send the message on only one wavelength but may exist when more 
than one wavelength may be used per link. The algorithms described above can 
be easily extended to handle the case that £ > 1. The details are omitted in the 
interest of space. 

4 Experimental Results 

In this section we describe experimental results using the algorithms presented 
in the previous section. In the interest of space, we restrict our attention to the 
case that £ = 1. The first set of experiments used the wavelength assignment 
algorithm described in Subsection tt. II to measure the number of multicast re- 
quests that were successfully realized as a function of the number of available 
wavelengths per link and number of available transmitters per node. Specifically, 
a random multicast tree with 100 nodes (n = 100) was generated in which each 
node had between 0 and 3 children (0 < dout(^^) < 3). The generated tree had 
height 8 and the destination nodes comprised the 53 leaves of the tree. Each 
link was assumed to carry 10 distinct wavelengths {w = 10). Very similar results 
to those reported below were obtained for other randomly generated multicast 
trees with other values of these parameters. 

In one group of experiments the number of available transmitters per node 
was chosen at random from the uniform [0,2] distribution and in the second 
group the uniform [1, 3] distribution was used. In all experiments, the number of 
available receivers per node was set to 1. In each group of experiments the set of 
available wavelengths on each link was also selected at random where the size of 
the set was taken from the uniform [x — 1, a; -I- 1] distribution for a given value of 
X. For each value of x ranging from 2 to 9, 100 runs were performed. The data 
labeled “Exact Solution” in Figure El shows the results of these experiments for 
the two groups of experiments. 

We have noted that the dynamic programming algorithms run in time 0(n) 
but the constant term depends on the number of wavelengths, transmitters per 
node, and degree of the switches. For the experiments described above, the max- 
imum amount of time required by the dynamic program for a multicast request 
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was 0.11 seconds on a 450 MHz Pentium 2. However, for larger values of the pa- 
rameters the running time was significantly larger. For example, for a problem 
instance with 100 nodes, 32 wavelengths per link, switches of degree 8, up to 
3 transmitters available per node for each multicast request, and an average of 
half of the 32 wavelengths available on each link, the running time increased to 
11.38 seconds. 

Therefore, in some situations it may be desirable to use heuristics that are 
faster or simpler than the dynamic programs described here. The exact solutions 
found by the dynamic programming algorithms can then be used off-line to eval- 
uate the quality of such heuristics. As an example, we have investigated a simple 
greedy heuristic for finding wavelength assignments. The heuristic operates as 
follows. The source node, s, determines the available wavelength that can be 
used to reach the largest number of its children, breaking ties arbitrarily. Then 
the available wavelength is found that reaches the largest number of remaining 
children. This process is repeated until a set S of wavelengths is found that can 
be used to reach all of the children of s. If the number of wavelengths in S ex- 
ceeds the number of transmitters available at s, the heuristic fails to satisfy the 
multicast request and terminates. Otherwise each child x of the source node may 
receive the message on any one of the wavelengths in S r\w{s,x). For each child 
X of s, the heuristic determines which A G Sr\w{s, x) reaches the largest number 
of children of x. This wavelength is then used to deliver the message from s to 
X and then from a; to as many of its children as possible. Next, the heuristic 
repeatedly selects the wavelength that can be used to reach the largest number 
of remaining children of x until all children of x are reachable with the selected 
wavelengths. If the number of wavelengths selected is larger than the number of 
transmitters at x then the heuristic fails to satisfy the request and terminates. 
Otherwise, this process is repeated until all destination nodes are reached. 

This heuristic has 0{n) running time but a significantly smaller constant 
term than that of the dynamic program. In comparison to the 11.38 seconds 
incurred by the dynamic program for the largest problem instance described 
above, this heuristic required only 0.01 seconds for the same data. The results of 
running this greedy heuristic for the data used above are shown in Figure 0 for 
comparison with the exact solutions obtained using the dynamic programming 
algorithm. Although the exact solutions are generally better than those found by 
the heuristic, the data also indicates that for some cases the heuristic performs 
very well. Other more sophisticated heuristics could also be considered at the 
expense of increased running time. 

Next, the dynamic programming formulation from Subsection 15. 21 was used to 
measure the number of hops required for the same parameters used in the above 
experiments. The results are shown in Figure E] for the case that the number 
of transmitters was selected from the uniform [1,3] distribution. Each curve 
labeled with a value h indicates the percentage of multicast requests satisfied 
using at most h hops from the source to any destination. We note that for 
this data set no multicast request could be satisfied using fewer than 2 hops 
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Exact Solution Henristic Solution 
o 0 < t{v) <2 • 0 < t{v) < 2 

<1 1 < t{v) <3 > 1 < t{v) < 3 




Mean number of available wavelengths per link 

Fig. 3. Percentage of multicast requests satisfied as a function of number of available 
wavelengths. 



and no multicast request required more than 7 hops. These results indicate the 
relationship between hop counts and percentage of satisfied requests. 




^ T I I 

23456789 



Mean number of wavelengths per link 

Fig. 4. Percentage of multicast requests satisfied using at most h hops as a function of 
the number of available wavelengths. Curves are labeled with h. 



5 Conclusion and Future Research 

In this paper we have investigated the problems of multicast routing and wave- 
length assignment. We have shown that the wavelength assignment problem for 
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any fixed multicast tree can be solved in time linear in the number of nodes 
when the number of wavelengths per link, transmitters and receivers per node, 
and switch degree are constants. Moreover, we have demonstrated that the dy- 
namic programming algorithm for the wavelength assignment problem can be 
adapted to find wavelength assignments that minimize the maximum number 
of hops from the source to all destinations. Similar adaptations can be made to 
find solutions that are optimal with respect to other metrics. 

The algorithms described in this paper can be used either to find exact so- 
lutions to the wavelength assignment problem or to evaluate solutions found by 
faster and simpler heuristics. Heuristics for minimizing the maximum number 
of hops, transmitter and receiver usage, and other measures of optimality are 
currently under investigation. 
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Abstract. We present a critical study of "the timed token" real-time 
communication protocol. This protocol presents a drawback towards 
asynchronous messages. In fact, if all stations of the network have permanent 
synchronous and asynchronous messages, only the first station can transmit its 
asynchronous messages during a limited interval of time. Then, only 
synchronous messages will be transmitted until at least one station does not use 
all its synchronous capacity for the transmission of synchronous messages. The 
regular timed token protocol [7] has been developed to solve this problem. 
However, it still occurs in the case where the station uses all its synchronous 
capacity to send synchronous messages. The proposed here, called improved 
timed token, uses the main key ideas of the two previous ones and permits the 
transmission of synchronous messages in some critical situations where they 
cannot be transmitted when using either timed or regular timed token protocols. 

Keywords: local networks, real-time protocol, non-real-time messages, 
scheduling messages, timed token, regular timed token, scheduling constraints. 



1 Introduction 

We address the issue of improving the timed token medium access control (MAC) 
protocol. This protocol is suitable for real-time applications not only because of its 
use in high bandwidth networks but also due to the fact that it has the important 
property of bounded access time which is necessary for real-time communications. 
The timed token protocol has been incorporated into many network standards, 
including the Fiber Distributed Data Interface (FDDI), IEEE 802.4, the High Speed 
Data Bus and the High Speed Ring Bus (HSDB/HSRB), and the Survivable 
Adaptable Fiber Optic Embedded Network (SAFENET). Many embedded real-time 
applications use them as backbone networks. 

With the timed token protocol, messages are grouped into two separate classes : 
the synchronous class and the asynchronous class. Synchronous messages arrive in 
the system at regular intervals and may be associated with deadline constraints. The 
idea behind the timed token protocol is to control the token rotation time. At network 
initialization time, a protocol parameter called Target Token Rotation Time (TTRT) is 
determined which indicates the expected token rotation time. Each station is assigned 
a fraction of the TTRT, known as synchronous capacity, which is the maximum time 
for which a station is permitted to transmit its synchronous messages every time it 
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receives the token. Once a node receives the token, it transmits its synchronous 
message, if any, for a time no more than its allocated synchronous capacity. It can 
then transmit its asynchronous messages only if the time elapsed since the previous 
token departure from the same node is less than the value of TTRT, i.e, only if the 
token arrived earlier than expected. 

The "timed token" protocol presents a drawback towards asynchronous traffic. 
Indeed, if all the stations of the ring have permanently real-time (synchronous) and 
non real-time (asynchronous) messages, the first station transmits non real-time 
messages during Thereafter, only the real-time messages will be transmitted until 
at least one of the stations does not use all its synchronous capacity for the real-time 
traffic. 

The regular timed token protocol was developed to solve this problem. 
Nevertheless, the problem persists if only one station of the ring has real-time and non 
real-time messages, it will use all its synchronous capacity to transmit only the real- 
time messages. 

Our contribution : the improved timed token protocol brings a solution to the 
encountered problems. 

We present the "timed token" real-time MAC protocol and the regular timed token 
protocol in section 2. In section 3, we present our approach in details. Section 4 
concludes the paper. 



2 The Timed Token Protocol 

This protocol uses the following parameters : 

- I't'st' (T arget Token Rotation Time) defines the target rotation time of the token. 
-Hi, i,^g ^1 (Synchronous capacity of node i), where m is the number of the stations 

in the ring. This parameter represents the maximum time for which a station is 
permitted to transmit synchronous messages every time the station receives the token. 
Note that each station can be assigned a different //_ value. In this paper, we assume 
that // =//j, Vj,k e{0,..,m-l j. 

- TRT^ (Token Rotation Time). It evaluates the cycle time (this counter is 
initialized to the T^^^ value and re-initialized to this value either when the token 
arrives early to the station or when the TRT^ is expired). 

- LCj (Late Counter of node k). This counter is used to record the number of times 
that TRTi, has expired since the last token arrival at node k. 

- THTi, (Token Holding Time), defines the time during which the station k may 
transmit non real-time traffic. 

Theoretically, the total available time to transmit synchronous messages, during 
one complete traversal of the token around the ring, can be as much as T^^^. However, 
factors such as ring latency 0 and other protocol /network dependant overheads 
reduce the total available time to transmit synchronous messages. We denote the 
portion of T^^^ unavailable for transmitting synchronous messages by T. 

That is, T= Q+A where A represents the protocol dependant overheads (the token 
transmission time, asynchronous overrun, etc.). We define the ratio of T to T^.^ to be 
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a. The usable ring utilization available for synchronous messages would therefore be 

(1-am. 

Thus, a protocol constraint on the allocation of synchronous capacities is that the 
sum total of the synchronous capacities allocated to all nodes in the ring should not be 
greater than the available portion of the Target Token Rotation Time (T^^), i.e., 

'^k=\ - '^TRT ~ ^ 

In the following studied case (figure 1), we consider T = 0. 



Timed Token Protocol [3] 

For each station k, (k=0, 1, 2,...m-l) : 

THT^<^0 ; 

LC^O ; /*initialization procedure */ 

starting the countdown of TRT^ 

While the network is working : 

If TRT=0 then 

k 

Endlf 

At the arrival of the token do : /* data transmission 



OCase 

• LC^=0 : /* token early arrival case */ 

THT^'^TRT 

k k 

Starting the countdown of TRT^ 
transmission of real time messages during 
starting the countdown of THT^ 

While THT^>0 and (3 non real-time messages in 
wait) : Transmission of non real-time messages 
token passing to the station (k+1) (modulo m) 

• kC^=l : /* token late arrival case */ 

LC^^O 

Transmission of real-time messages during 
token passing to the station (k+1) (modulo m) 

• LCj^>l : /* error case */ 

« error recovery» procedure 



EndCase 



END 
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Critic : 

Let us consider the situation where on one hand all stations have real-time traffic to 
transmit permanently during their synchronous capacity and on the other hand, the 
first station uses all the time that it possess to transmit the non real-time traffic (it may 
transmit during an interval of time corresponding to ) (figure 1). 

The next diagram represents the time filling of network transmissions. 



TRT, 




TRT^ 

THT 

k 

The station transmits 
real-time messages 
during H^. 

The station transmits 
non real-time 
messages 



\ 

\ 



Fig. 1. Example of a critical case with the «timed token» protocol. 

- We notice that in the most unfavorable case, all stations use all their synchronous 
capacity and no longer give a chance to the non real-time traffic. This situation 
lasts until all stations do not use the totality of their synchronous capacity. 

- If all stations respect protocol constraints, no LQ will be able to reach a value 
greater than 1 (error situation). 



2.1 The Regular Timed Token Protocol [7] 

In this algorithm, the author considers T^^^ not as the token target time, but as the 
maximum time. 
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• We assign to a station k, a synchronous capacity for the real time traffic (if any). 
And if the synchronous capacity H,, is not expired, the station transfers the non real- 
time traffic until the expiration of //j. 

• The constraint (1) is valid for this algorithm. For the example of figure 2, we 
assume that x=0. 

Algorithm of the regular timed token protocol 
For each station k, k=0 , 1 , 2 , : 

THT^^O ; ; /* initialization procedure */ 

Starting the countdown of TRT^ 

For each station k, k=0 , 1 , 2 , ■. 

If TRT^=0 : «ring recovery procedure» /* error */ 

At the arrival of the token, Do ;/* data transmission */ 

TRT^^T^^^ ; 

Starting the countdown of TRT^ and THT^ 
while THT^>0 and (real-time messages in wait) : 
transmission of real-time messages 
while THTj^>0 and (non real-time message in wait) : 
transmission of non real-time messages 
passing the token to the station (k+1) (modulo m) 

EndDo 
End . 



Critics : 

1 - Let us consider the situation where all the stations of the ring have permanent 
real-time traffic. Consequently, no asynchronous messages will circulate in the 
network. However, the timed token algorithm allows the first station to transmit, at 
the beginning, the non-real-time message during 

2 - When only one station of the network has, permanently, real-time and non real- 
time traffic, only real-time messages will be transmitted. This is the main drawback of 
this variant, (figure 2) 



TRT^ 

rm, 



H, 










Station k 




TT/Tj ; transmitting 
real-time messages 
during //j, (countdown 
of THT^) 

Countdown of TRT, 

k 



time 



Fig. 2. An example of a critical case with the regular timed token protocol. 
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The previous diagram shows that no non-real-time messages will be transmitted 
although all the other stations remain idle. 

The proposed protocol uses the same variables as those of the timed token 
protocol with the introduction of a new variable HR^ that denotes the remaining time 
from //j after station k has sent all its real-time messages. 



3 The Improved Timed Token Protocol 



Principle : 

- We assign to each station a time capacity Hi,, that represents the maximum time, 
during which it can transmit the synchronous traffic. 

- This protocol allows a station to transmit non real-time traffic whether: 

- it receives the token early, or 

- HRi > 0. 

k 

The second principle allows the transfer of asynchronous messages (if any) of the 
current station, instead of sharing its remaining time with the other stations or to 
wait for the next reception of the token to transfer non-real-time messages. 

The constraint (1) is still valid for this algorithm. For the example of figure 3, we 
assume that T=0. 



Algorithm of the improved timed token protocol 
For each station k, (k=0, 1, 2,...m-l) 



THTi^^O 

LC,<-0 



/* initialization procedure */ 



Starting the countdown of TRTi^ 

While the network is working. Do : 

If TRT=0 then 

k 

Endlf 

At the arrival of token Do : /* data transmission */ 
Case 



• LCi^=0 /* token early arrival case */ 

THT,^<-TRT„ 

k k 

HRi^^ffi 

starting the countdown of (TRTj^, HRJ 
While HRi^>0 and (real-time messages in wait) : 
Transmission of real-time messages 
HR^'(-THT+HR^ 

k k k 

Starting the countdown of 

While HR,^>0 and (synchronous messages in wait): 
Transmission of non-real -time messages 
token passing to the station (k+1) (modulo m) 
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• LCj^=l /* token late arrival case */ 

LC,^0 

Starting the countdown of (TRT^^ , HRJ 

While HR^>0 and (real-time messages in wait) : 

Transmission of real-time messages 
while RR^>0 and (non-real-time messages in wait) : 
Transmission of non-real-time messages 
token passing to the station (k+1) fmodulo m) 

• LC^>1 /* error */ 

« error recovery» procedure 
EndCase 

END 



Let us consider the case where we have three stations in the ring such that, at the 
beginning, the first station uses only a portion of its synchronous capacity to transmit 
real time messages. After that, all stations possess constrained messages permanently. 

The next diagram represents the time filling of the network exchanges for the 
timed token protocol and the improved timed token protocol : 




Fig. 3. First comparison between the « timed token » protocol and the improved timed 
token protocol, a) the first station waits for the next token arrival to transmit non real- 
time messages, b) In this case, it transmits them immediately. : in this case we 
assume that LC, was equal to 1 before the arrival of the token (at the beginning). 
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According to figure 3, the first station has real-time and non-real-time traffic. In 
the "timed token" protocol, the non-real-time messages will be transmitted at the 
second arrival of the token. However, the "improved timed token" protocol will make 
profit of the remaining synchronous capacity, used for the transmission of the real- 
time messages, to transmit non real-time messages at the first arrival of the token. 
Another advantage of this protocol is the transmission of non-real-time messages 
during the first reception of the token in the case where one station has only non-real- 
time messages (without waiting for TC^to be 0). Each station that does not use all its 
synchronous capacity uses its remaining time HR^ for the transmission of non-real- 
time traffic before passing the token to the following station (figure 4): 



TRT TRT 




Fig. 4. Second comparison between the « timed token » protocol and the improved 
timed token protocol. : in this case, we assume that LC, was equal to 1 before the 
arrival of the token (at the beginning). 



We notice that, under the « the timed token » protocol, station 2 transmits its non real- 
time messages by exploiting the remaining time of station 1. However, in the 
improved timed token protocol, the first station transmits its own asynchronous 
messages (if any) during before passing the token. 

We give now a comparison between the regular timed token protocol and the 
improved timed token protocol. 

Let us consider the situation where there is only one station i that possesses real- 
time and non-real-time traffic (figure 5): 
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Fig. 5. A comparison between the regular timed token and the improved timed token 
protocols, a) non real-time messages are not transmitted, h) non real-time messages 
are transmitted. 



Although the station may transmit non time-real messages, it will not be able to do it 
with the regular timed token protocol, elsewhere it will transmit only real-time 
messages. With the improved timed token protocol, the transmission of non-real-time 
messages is guaranteed. 



4 Conclusion 



We presented a critical study of the timed token protocol. After having given an 
overview of the timed token protocol and the regular timed token protocol and 
highlighting their drawbacks towards non real time traffic, we proposed an 
enhancement to the previous algorithms. The improved timed token protocol allows 
the transmission of asynchronous messages either without waiting the next arrival of 
the token or if there is a remaining time after transmitting real time messages. 
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Abstract. With its central and unique ability to execute hytecode on any 
platform, the Java programming language has gained increasing popularity in a 
wide area of computation especially in internet-related applications. Even with 
its hroad applicability, the standard Java virtual machine is deficient in the 
capability to express real-time constraints. In this study, a specification 
language is suggested to specify real-time constraints, which generates skeletal 
Java code containing the invocation of real-time APIs. The use of 
multithreading is proposed to implement real-time APIs without modifying the 
current Java semantics. This approach may expand the application area of Java 
applet with existing language features especially to specify soft real-time 
constraints for visual specification or modeling based on internet. With the 
suggested technique we can specify timing semantics including maximum, 
minimum, durational, and relative timing constraints. The detailed execution 
orders of multithreads to express many forms of timing constraints have been 
packaged into API libraries for maintainability and readability. 



1 Introduction 



Specification is concerned with what the software components of a system should do, 
not with how it is to he implemented [1]. A specification differs from an 
implementation in that there is no need for it to be efficient in the computational 
sense; rather a specification describes only the external behavior of the software [2]. 
A formal specification is easier to understand than a program because it is written in a 
language chosen for ease of expression [3]. Thus, a formal specification can also be 
useful for program documentation. To obtain a good specification, a formal 
specification language is used since it can be tested for ambiguity, consistency, and 
correctness [4]. 

The suggested formal specification language temporarily named “Spec” consists of 
two main parts, i.e., the process list part and the timing constraints part. The necessary 
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processes used as events are declared in the process part, and the required timing 
constraints related to the declared processes are specified in the timing part. The 
processes used in the specification are translated into threads in Java. The syntactic 
definition of Spec based on the BNF is available in Appendix. The most important 
role of Spec is to generate appropriate APIs contained in generated skeletal Java 
classes. The Spec compiler has been implemented with the automatic lexer and parser 
generator based on the Java programming language, i.e., JLex and CUP [5]. 

The following example specification shows the real-time requirements for the 
movement of the robot arm of a welder machine, which has been used as an example 
in Section 3 of this paper. 

Spec Machine_Arm 
Begin 
Process 

FirstTimer, SecondTimer , 

FirstEvent , SecondEvent , 

ThirdEvent , FourthEvent , Fif thEvent , SixthEvent ; 
Timing_Constraints 

MinimumTime : 

FirstEvent , SecondEvent , FirstTimer ( 5 ) ; 

MaximumTime : 

ThirdEvent , FourthEvent , SecondTimer (3 ) ; 

Dura tionTi me : 

Fif thEvent (4 ) , SixthEvent ; 

End Machine_Arm 

The reserved words of Spec are indicated by italic type in this example. The example 
specification declares two timers and six events which are supposed to be 
implemented with multithreading in the target Java classes. In the timing part of the 
specification, the occurring order of events constrained by a timing constraint is 
indicated by the actual sequence of events followed by timing information. 

The minimum timing constraint stated under the keyword, MinimumTime, in the 
example specification indicates that at least five time units should elapse for the 
second event, SecondEvent, to occur after the occurrence of the first event, 
FirstEvent. The five time units in this case is indicated by the timer process, 
FirstTimer(5). The role of a timer is explained in detail in Section II. The maximum 
timing constraint in the example specification indicates that the fourth event, 
FourthEvent, occurs within three time units after the occurrence of the third event, 
ThirdEvent. For the durational timing constraint of the example, the sixth event, 
SixthEvent, occurs after the four time units of the fifth event’s duration, 
FifthEvent(4). 

Timing constraints are concerned with the absolute timing of events and the 
relative order in which actions or stimuli are produced. Absolute timing constraints 
represent the actual time of the start or finish of an event [6], and are categorized as 
follows [7]: 
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Maximum timing constraints demand that no more than t amount of time may 
elapse between the occurrence of one event and the occurrence of another; 

Minimum timing constraints stipulate that no less than t amount of time may 
elapse between two events; and 

Durational timing constraints express that an event may occur for t amount of 

time. 

Real-time systems are those in which the correctness of the system depends not only 
on the results of computation but also on the time at which the results are produced [8, 
9]. Real-time systems are often termed as hard real-time systems, as opposed to soft 
real-time systems [10]. The degree of dependency on the timing constraints is the 
only criteria used to differentiate between hard and soft real-time systems. In soft 
real-time systems, slow response time (or missed deadline) can be tolerated as long as 
1) not too many deadlines are missed, and/or 2) deadlines of real-time processes are 
not missed by much [11]. 

Currently it is very hard to expect hard real-time performance with Java applet 
programming mainly due to slow execution speed of the Java virtual machine [12]. 
To improve the execution speed of Java bytecode, the Just-In-Time (JIT) compiler 
compiles the bytecode into native code. The result of JIT compilation is cached to be 
called again when needed [13]. The use of hardware execution of Java bytecode is 
another approach to run Java code even faster than using JIT compilers [14]. 
Sophisticated compilers have been tried to conserve system resources and run the 
cache more efficiently without using dedicated Java chips [15]. The current approach 
to guarantee the deterministic behavior in multithreading is to use extended Java 
virtual machine running on a real-time operating system for obtaining improved 
thread scheduling and run-time performance [16]. Two groups are pursuing to 
standardize the extensions to Java for real-time embedded applications, which are the 
J Consortium led by the Hewlett-Packard and the Real Time Experts Group led by 
Sun Microsystems [17]. 



2 Absolute and Relative Timing Constraints Expressed 
with Multithreading 



In maximum and minimum timing constraints described in this section, the time span 
of the timer is assumed to be much longer than the time span of two events bounded 
by the given timing constraint. In case of relative timing constraint, the durational 
time span of the second event is assumed to be not shorter than the time span of the 
first event. The “First_Event” and “Second_Event” used in the timing diagram of all 
the tables represent the events of the API code snippet, “A” and “B,” respectively. 



2.1 Maximum Timing Constraints 

To express maximum timing constraints between two events, the multithreaded 
concurrency is used to express the maximum time span during which the second event 
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should occur. Table 1 shows the API code snippet and the corresponding timing 
diagram for implementing absolute maximum timing constraints. The implementation 
code of the suggested APIs is available when requested to the author. The following 
shows the specification statement with Spec and the corresponding prototype of the 
API for a maximum timing constraint: 

MaximumTime ; First_Event, Second_Event , Timer (3 ); 
public static void maximumTimingConstraint ( 
j ava . lang . Thread First_Event, j ava . lang . Thread Timer, 

j ava . lang . Thread Second_Event ) 



The join() method called by the timer object after the start of “Second_Event” 
guarantees that “Second_Event” finishes before the end of the timer duration. The 
join() method invoked by “Eirst_Event” makes “Second_Event” to get started after 
“First_Event.” The durations of "‘First_Event” and “Second_Evenf’ are assumed to 
be much less than the time span of the timer thread. The timing diagram shows that no 
more than “MaxJTime” amount of time may elapse between the occurrences of two 
events, “First_Evenf’ and “Second_Event.” The timer event is used to count down 
“MaxJTime” amount of time span. 



Table 1. Sequenced order of thread events to express maximum timing constraints 



API code snippet 

A.startQ; 
timer. start(); 
try { 

A. joinO; 

B. startO; 
timer.] oin(); 

} catch (...) {} 



Timing relationship among threaded events 
“First_Event” “Max_Time” Span 






I 

I 

1 Time 

I 

^ ► 

Duration of “SecondbEvent” 



■H 



the time span of the timer thread 



2.2 Minimum Timing Constraints 

The minimum time that should be passed between two occurrences of events is one of 
the following three cases: 

1) Case One: The time span between the start of the first event and the start of the 
second event; 

2) Case Two: The time length between the finish of the first event and the start of the 
second event; and 

3) Case Three: The time interval between the start of the first event and the finish of 
the second event. 
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Table 2. Sequenced thread events to express minimum timing constraints 



API code snippet 


Timing relationship constrained by minimum timing 
constraint 


A.startO; 
timer. start(); 
try { 

A. joinO; 
timer.joinO; 

B. startO; 

} catch (...) {} 


“First_Event” “Second_Eve 


nt” 

Time 

er 


► 

Minimum Time Span of Tim 



For Case One, the durational time of the first event is contained in the minimum 
timing constraint between two events because the second event is supposed to occur 
after a minimum time after the start of the first event. For Case Two, a minimum time 
elapses after the finish of the first event before the start of the second event. The 
minimum time between two events in Case Three contains both durational times of 
the two events. Table 2 shows the API code snippet and the corresponding timing 
diagram for Case Two. 

The specification statement and the generated prototype of the API for a minimum 
timing constraint are as follows: 

MinimumTime : FirstEvent, SecondEvent, Timer (5); 
public static void minimumTimingConstraint ( 

j ava . lang . Thread First_Event, java. lang. Thread Timer, 
j ava . lang . Thread Second_Event ) 

The program makes use of the join() method called by the timer object to guarantee 
the finish of the timer object before the start of the second event. The timing diagram 
shows that the generation of “Second_Event” occurs after the elapse of the minimum 
time period placed by the timer thread. 



2.3 Durational Timing Constraints 

A durational timing constraint imposes some fixed amount of time on the duration of 
an event. For example, an event with some duration in a reactive system may activate 
or deactivate an external device. In this case, the activated or deactivated state of the 
external device controlled by the signal (event) is constrained by the durational time 
of the controlling event. Table 3 shows the API code snippet and the corresponding 
timing diagram. 
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Table 3. Expression of durational timing constraints 




As illustrated in Table 3, the join() method called by “First_Event” guarantees that 
the finish of “First_Event” occurs before the start of “Second Event.” The 
specification statement and the prototype of corresponding API for a durational 
timing constraint are as follows: 

DurationTime : First_Event (4), Second_Event ; 
public static void durationalTimingConstraint ( 

j ava . lang . Thread First_Event, 
j ava . lang . Thread Second_Event ) 



2.4 Relative Timing Constraints 

There are seven relations between intervals [18] as illustrated in Fig. 1. All the 
relations in the figure can be used as the types of relative timing constraints between 
two periodic or aperiodic events. 



(a) A before B (b) A finishes B fc) A equal B (d) A during B 




(e) A meets B (f) A starts B (g) A overlaps B 




Fig. 1. Seven interval relations between two events, A {event A) and B (event B) 
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The relative timing constraints in the form of “A before B,” “A meets Bf ‘A 
overlaps B,” and ‘A during B” can be expressed by using the combination of join() 
method and linearly-sequenced generation of events. However, it is very difficult or 
almost impossible to express such types of relative timing constraints as ‘A equal B,” 
‘A finishes Bf and “A starts B” due to the non-deterministic behavior of 
multithreading and the lack of parallel constructs with Java. Table 4 shows the two 
types of relative timing constraints, “A meets B” and ‘A overlaps B,” which can be 
applied to express the other types of relative timing constraints, “A before B” and ‘A 
during B.” 



Table 4. API code snippets for “A overlaps B” and “A meets B” 



API code snippet for “A overlaps B” 


API code snippet for ‘A meets B” 


A.startO; 


A.startO; 


try { 


try { 


B.startO; 


A.joinO; 


A.joinO; 


B.startO; 


} catch (...) {} 


} catch (...) {} 



In the API for implementing “A overlaps B,” the start of the first event precedes the 
start of the second event. The finish of the first event occurs before the end of the 
second event under the assumption that the duration of the second event is not less 
than the duration of the first event. In the API for ‘A meets B”, the second event is 
forced to start after the finish of the first event which invokes join() method. When 
join() of the first event returns, the first event is guaranteed to have finished before the 
start of the second event. The specification statements and the prototypes of the 
generated APIs for durational timing constraints, “A overlaps B” and ‘A meets B,” are 
as follows: 

RelativeTimeMeet : Event_A, Event_B; 

public static void relative_Timing_A_meets_B ( 

j ava . lang . Thread Event_A, java. lang. Thread Event_B) 

RelativeTimeOverlap : Event_A, Event_B; 

public static void relative_Timing_A_overlaps_B ( 

j ava . lang . Thread Event_A, java. lang. Thread Event_B) 



3 Example Visual Specification of a Welder Robot Arm 

The example machine to be prototyped with Java applet consists of an upper arm, a 
lower arm, and a welder tip. The moving steps for the example machine are as 
follows: 

(1) The machine stretches the lower arm after the elapse of the predefined minimum 
time, five time units, as illustrated in (b) of Figure 2; 
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(2) The tip is stretched within the predefined maximum time span, three time units, 
as illustrated in (c) of Figure 2; and 

(3) The stretched tip is retracted to the lower arm after the predefined durational 
time span, four time units, as illustrated in (d) of Fig. 2. 




abed 

Fig. 2. Prototype of a welder robot arm with the movement sequence of a (initial state) , b 
(stretching arm), c (stretching tip), and d (retracting stretched tip) 



The example specification in Section 1 is used to specify the real-time requirements 
needed for the movement of the example robot arm. The part of the specification 
related to the real-time requirements of the robot arm is as follows: 



Timing_Constraints 
MinimumTime : 

FirstEvent , SecondEvent , FirstTimer ( 5 ) ; 

MaximumTime : 

ThirdEvent , FourthEvent , SecondTimer (3 ) ; 

Dura tionTi me : 

Fif thEvent (4 ) , SixthEvent ; 

The above specification statements generate Java skeletal classes containing the 
three real-time APIs for simulating the robot arm as follows: 

(I) minimumTimingConstraint(First_Event, First_Timer, Second_Event); 

(D maximumTimingConstraint(Third_Event, Second_Timer, Eourth_Event); and 
(3) durationTimingConstraint(Fifth_Event, Sixth_Event). 

The Java skeletal classes generated by Spec compiler for this example can be obtained 
from the author when requested. The API calling, minimumTimingConstraint, draws 
the stretched lower arm after the predefined minimum time specified by the timer, 
FirstTimer. The second API, maximumTimingConstraint, draws the protrusion of the 
welder tip within the predefined maximum time specified by the timer, SecondTimer. 
The retraction of the tip after some durational time specified by Fifth_Event is 
specified by calling the API, durationTimingConstraint. 
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4 Conclusions 



Using the “Spec” specification language, the functional requirements of a real-time 
system can he specified without mental burden of raw Java programming. The Spec 
compiler generates the skeletal Java classes containing real-time APIs. The suggested 
real-time APIs using multithreading to express soft real-time constraints with Java 
applet do not affect and complicate the current syntax and semantics of Java. The use 
of the supposed specification language and the added capability to express timing 
constraints with Java applet may expand the applicability of the language to web- 
based visual specification or modeling requiring real-time behavior. The use of the 
suggested APIs to encapsulate the implementation of timing constraints also keeps 
modular maintainability of the Java programming language. 

To express hard real-time constraints, there is a need to improve the Java virtual 
machine for overcoming the problems related to the quasi-parallelism, automatic 
garbage collection, and non-determinism of multithreading. The fact that Java does 
not support the true parallelism indicates that there is some delicate scheduling 
complexity with using the concurrency of multithreading to express timing 
constraints. 

Currently proprietary extensions to the Java virtual machine and native methods are 
used in the application emphasizing the speed of code. Even though the current 
standard Java virtual machine does not support hard real-time performance, the 
suggested specification technique and the Java applet with real-time APIs can be used 
to express soft real-time visual specification of mechanical movement. More research 
is needed to refine the specification technique and to expand the functionality of APIs 
to abstract real-time constraints for many thread events which are supposed to 
prototype the movement of multiple separate machine components related to each 
other with timing constraints. 
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Appendix: BNF Syntactic Definition of Spec 

Rules and shorthand to read the definition are as follows: 

1 . terminals in italic type; 

2. one or more instances by the unary postfix operator H-; and 

3. a character class by the notation [...]. 

specification ::= Spec identifier Begin body End identifier 
body ::= Process ThreadList ; TimingjConstraints TimingList 
ThreadList ::= ThreadList , ThreadName | ThreadName 
ThreadName ::= identifier | identifier ( digit ) 

TimingList ::= TimingList TypeOfTiming | TypeOfTiming 
TypeOfTiming ::= TimingType .■ ThreadList ; 

TimingType ::= MaximumTime \ MinimumTime \ DurationTime 
I RelativeTimeMeet \ RelativeTimeOverlap 
identifier ::= [a-zA-Z_] [a-zA-Z_]* 
digit ::= [0-9]+ 
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Abstract. The number of applications requiring real-time transmission of 
multimedia streams over the existing network structure is increasing. Large 
amounts of digital video, audio and text data of continuous nature and with 
associated timing constraints are thus being exchanged, making network 
overloads more likely. Our paper presents a feedback-controlled solution for 
ensuring a continuous transmission and play-out of MPEG streams even in 
congested network conditions. A double-channel link (TCP and UDP) allows 
data to be sent from the server to the client, and control information describing 
the quality of the transmission to be sent between client and server. The control 
data is collected by a special unit located at the client which regularly inspects 
the receiver and driver buffers. The control data received from the client is 
processed by a feedback manager at the server which controls a transmission 
shaper. The latter can influence the transmission process by modifying 
parameters such as the transmission buffer size and the transmission frequency. 
Experimental results show improved behavior of the system in congested 
network conditions. 



1. Introduction 

In the early days of the Internet, the transmission of text-only documents, and later 
text with static pictures, was considered more than enough by most users. Now 
requests for multimedia data transfers between the growing number of clients and 
servers are increasing rapidly. All the components (audio, video, data and text) have 
to be transferred over the current infrastructure which is still using window-based 
flow control, unsuitable for transmitting continuous streams. New protocols such as 
Real Time Protocol (RTP) [1], Real Time Streaming Protocol (RTSP) [2], and Xpress 
Transport Protocol (XTP) [3] have been deployed in order to try to improve 
multimedia transmission and streaming. Their main concern is to try to meet the users' 
Quality Of Service (QoS) requirements. Unfortunately all of them rely on the existing 
IP network elements which treat all transmitted packets equally in so-called "best- 
effort" service. Extensive research has been done and different proposals for ensuring 
QoS have been made, such as Integrated Services, Differentiated Services, 
Multiprotocol Label Switching, or Constraint Based Routing [4]. The cost of 
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implementing these mechanisms has resulted in limited use to date by the large 
majority of users. Thus proposals for trying to ensure a better delivery of continuous 
media streams over the existing infrastructure are of great interest. 

Apart from the continuous nature of the multimedia streams, their large size causes 
problems. Different compression techniques were developed in order to reduce the 
quantity of data being transmitted. Among them, Motion Picture Experts Group 
(MPEG) encoding [5, 6] proved to be one of the best, not only from the compression 
ratio point of view, but also because its structure allows stream playing even in the 
case of random packet loss [7]. Unfortunately MPEG transmission is very bursty, and 
even when compressed the stream size is not small. Its transfer over the network may 
create congestion, especially when the network is already loaded. 

This paper proposes a traffic shaping scheme to improve MPEG stream 
transmission from a server to a client over an IP network. The mechanism tries to 
reduce the burstiness of MPEG transmission and to lower the effect of network 
congestion. It uses feedback control information sent from the client to the server, 
informing it about the quality of the received stream. The server uses this control data 
in order to adapt its transmission policy, whilst the client continues to send feedback 
information. The proposed mechanism has been implemented and tested by a client- 
server system as described in the next section. 



2. The Client-Server System's Overview 



The system consists of two applications: the server and the client (Fig. 1). 
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Fig. 1. The structure of the client-server system 

The Acquiring Unit and the MPEG Encoder card are used by the server to capture 
and compress the multimedia streams. The server's Connection Manager helps it to 
listen for incoming connection requests from the clients, and, once they have been 
accepted, to maintain them. The latter is also the purpose of the client's Connection 
Manager. The MPEG Decoder, the Synchronization and the Display and Play Units 
are in charge of transforming the received data into multimedia streams and of 
displaying or playing them to the client. The Feedback Indication Unit and the 
Feedback Manager have an important role in implementing the feedback control 
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scheme. The communication between the server and client is done via a double TCP 
and UDP channel described in detail in [8]. This combines the advantages of both 
protocols: reliability for the control TCP connection, and speed and multicasting for 
the unreliable UDP channel used for data transfer. 

The client-server system has been implemented in Visual C++ 6.0, using an object- 
oriented approach. It makes use of the multi-threading support offered by the 
implementation environment. The message and event handling by both the client and 
the server applications is supported by the Windows event and messaging system. The 
sockets and all the mechanisms necessary for inter-application communication are 
provided by the Windows Sockets 2 (WinSock2) architecture. MPEG streaming, 
buffering, decoding and playing [9], as well as the mechanisms described in detail in 
this paper, have been implemented and tested by the authors. 



3. The Server Transmission Shaper 

Processing MPEG streams (decoding, playing, transmission) is very demanding in 
resources such as CPU, memory and bandwidth [10]. The recent availability of very 
high-performance CPUs (currently 1.5 GHz) and the continuously-reducing price per 
megabyte of RAM memory has not been accompanied by a similar significant 
increase in bandwidth, which still remains the bottleneck for any transmission of 
multimedia data. The relative sizes of I, P and B MPEG video frame types make the 
flow very bursty and sensitive to loss and jitter. Besides, in the majority of cases, a 
big difference between the peak and mean bit-rates adds an extra burstiness to the 
MPEG video flow [11]. 

The traffic shaper we propose reduces the burstiness of the transmission by 
introducing a double control of the sending process. Both buffering mechanism and 
transmission scheme are supervised and controlled by the Feedback Manager (Fig. 2). 




Fig. 2. The Traffic Shaper uses feedback information to adjust the transmission parameters 
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The scheme implements a modified token bucket mechanism [12] to which a 
variable token generation procedure has been added. A feedback-controlled timer, 
part of the Transmission Control Unit, generates tokens used for sending data packets. 
Once the timer has been started, it continuously generates tokens with a certain 
frequency. Thus for periods of time the transmission bit-rate can be considered 
constant. The main advantage of such a scheme is that it reduces the burst caused by 
sending MPEG video frames, by spreading data to be sent over a larger period of time 
(Fig. 3). This may add delay to the last packets belonging to some of the frames, but it 
has been found experimentally that the mean value for the packet delay doesn't differ 
too much from the one experienced with the normal transmission scheme (Section 5). 
The client buffering and other features of our feedback scheme compensate for this 
small disadvantage, and as a matter of fact the computed jitter decreases. 
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Fig. 3. The transmission traffic shaper reduces the burstiness of the transmitted MPEG stream 
a) The normal transmission case b) The transmission with the traffic shaper scheme 

Besides its role in reducing the burstiness of the MPEG transmission, the feedback- 
controlled generation of tokens allows the modification of the transmission bit-rate by 
varying the transmission control's timer frequency. The timer will be reinitialised, and 
will generate tokens with a different frequency, any time the Feedback Manager 
instructs it to, after analysing the feedback data received from the client. The 
measures it takes depend not only on the current feedback data, but also on their 
fluctuation in time, thus preventing changing the server's state too frequently. For the 
case of a multicasting transmission, a simple arbitration scheme has been designed, 
taking into account which measure is best for the majority of the clients. 

The Feedback Manager controls the transmission buffering as well. It is 
experimentally proved that the effect of losing a packet is more important for the 
transmitted stream if its size is larger than if a small packet is lost (Fig. 5). Sending 
large packets also introduces an extra burstiness to the transmission. Smaller packet 
sizes lead to increased overhead compared to larger packet sizes. Sending packets that 
are too small not only requires the client to be able to receive them in time to avoid 
losing them, but also it needs some time to reorder them. This is necessary because it 
is more likely they will arrive out of order than in the case of larger packets. Thus the 
Feedback Manager has to dynamically find a trade-off between the transmission 
frequency and the size of the packets, to maximise the performance of the 
transmission. For taking its decisions, the Feedback Manager relies on control 
feedback data received from the Feedback Indication Unit situated at the client and 
described in the next section. 
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4. The Client Feedback Indication Unit 

The client application consists of a receiver thread (with a higher priority) called by 
the Windows framework every time when an incoming data packet has arrived, an 
MPEG decoder thread which decodes received data, and a player thread in charge of 
displaying and playing decoded data. Two buffers are shared by the client threads: a 
receiver buffer and a driver buffer for both audio and video streams respectively (Fig. 
4). 



f 






Fig. 4. The client's structure includes a Feedback Indication Unit 

The purpose of the Feedback Indication Unit is to collect data about the quality of 
the transmission where it can be best judged: at the client. It analyses both the 
receiver buffer and the driver buffer occupancies, and statistical data is updated in 
real-time about the number of lost frames, frames being late or which come out of 
order. The Unit repeatedly sends control messages to the server carrying reports about 
the state of the reception. They are processed by the server's Feedback Manager 
which takes decisions in real-time which improve the quality of the transmission. 



5. Experimental Results 

In Fig. 5a we plot the number of lost packets when different transmission packet sizes 
have been sent. Although the number of lost packets doesn't vary much, it slightly 
decreases with the size of the transmitted packet. But, as mentioned in Section 3, the 
percentage of data lost (Fig. 5b) is much higher if large packets don't arrive. 

For computing the one-way delay of the packets, we need to have both destination 
and sender computers with perfectly synchronized clocks. For a very precise 
synchronization, special devices as GPS, atomic clock or an ISDN synchronous clock 
board are needed [13]. Our experiments deal with millisecond order delays, so we 
can use NTP protocol [14] for synchronizing both the server's and the client's clocks, 
by connecting to the Atomic Clock time server in Boulder, Colorado (USA) and 
adjusting both computer's clocks to match the atomic clock value. 
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Fig. 5. a) Number of Lost Frames vs. Frame Size b) Percentage of Lost Frames vs. Frame Size 



Cheway Frame Delay on a LAN 

Oyirr _ 


320 — 


fl 


- 


3C0 — 








1 1 mr 1 1 


no 


1^1 

^"1 














tDi-CDi-CD'*- 
1- 1- OJ CM (T 


co'^'^inincDtDh'.h'.co 

Raires 



Ch&way 3 LAN 




Fig. 6. Transmission Without Traffic Shaping Scheme a) One-way Frame Delays On A Normal 
LAN b) One-way Transmission Jitter On A Normal LAN 




Fig. 7. Transmission Without Traffic Shaping Scheme a) One-way Frame Delays On A 
Congested LAN b) One-way Jitter On A Congested LAN 

To compare the behavior of our a client-server system with and without traffic 
shaping scheme, we analysed first the results obtained during the transmission of a 9- 
second MPEG system stream (1.6 Mbytes) over a LAN in two different conditions. 
Fig. 6 shows the one-way delay and jitter measured in the case of a normally-loaded 
LAN without traffic shaping while Fig. 7 shows the results obtained in the case of 
transmission over a congested LAN without traffic shaping. 
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Fig. 8. Transmission With Traffic Shaping Scheme a) One-way Frame Delays On A Normal 
LAN h) One-way Transmission Jitter On A Normal LAN 

In Fig. 8a and Fig. 9a we plot the one-way delays for the same 9-second MPEG 
system stream over the same LAN in the same two cases (a normally-loaded network 
and a congested one), this time using the traffic shaping scheme. In both cases, the 
extra delay added by our traffic shaper is small, and in both cases, the jitter decreases 
(Fig. 8b and Fig. 9b). 




Fig. 9. Transmission With Traffic Shaping Scheme a) One-way Frame Delays On A Congested 
LAN h) One-way Jitter On A Congested LAN 

Our experiments consisted of transmissions of the same MPEG stream over a LAN 
during the off-peak and peak hours of a day, when the network is normally-loaded 
and congested, respectively. The client's Eeedback Indication Unit took into account 
both the receiving buffer occupancy and the number of packets lost during the 
transmission, those which arrived too late for play-out, or out of order. In Fig. 10 we 
captured the dynamics of the server state during the transmission. The server 
asymmetrically changes its state after receiving and analysing control messages from 
the client. Thus if a report of a decreasing reception quality at the client side is 
received, the server immediately changes its state into a new one with lower quality 
transmission. In case of a feedback control information carrying news about an 
improvement in the receiving quality, the server waits for a number of successive 
positive reports before increasing its state into a higher quality transmission one. 

Because the server assumes at the beginning that the highest quality transmission 
state can be maintained, for all the transmissions, regardless of the state of the 
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network, we noticed a transition period when the server is searching for its right state. 
Eventually, after a few transitions between states, the server stabilises and continues 
the transmission at a certain level of quality. In the case of a congested network (Fig. 
10b) this level is lower than the one experienced in a normally-loaded network (Fig. 
10a). Also it is more likely to have further state transitions in a congested network 
than in one with normal loading. 



Transitions of the Server State in a Normal LAN 




Time 




Fig. 10. Server State Transitions a) On a Normal LAN b) On A Congested LAN 



6. Conclusions and Further Work 

The main idea of this paper is to present a mechanism for traffic shaping of 
multimedia transmissions driven by the feedback control messages sent by the client. 
Different criteria were used for generating the reports carried by the feedback control 
messages. The dynamics of both the receiver and the driver buffer occupancies were 
taken into account, as well as the number of lost or late packets. The use of other 
metrics may be the subject of future investigation. The feedback data are collected 
and analysed by the server which takes decisions in order to improve the 
transmission. Varying the transmitted packet size and adjusting the transmission bit- 
rate were taken into account as possible server measures. They are applied effectively 
by a specially-designed transmission traffic shaper. 

The application of our traffic-shaping scheme makes possible a trade-off between 
the continuity of the transmitted stream and its quality. In the majority of cases it is 
preferable to continue the streaming and display or play with a worse quality 
(eventually altered by some lost packets) than to stop the whole stream while 
buffering. 

Some experimental results concerning the transmission packet size and the 
transmission one-way delay and jitter were analysed. They prove that the proposed 
traffic shaping scheme is well behaved even in the case of congested network 
conditions. The testing was done while transmitting over a LAN, but experiments 
with transmission over WAN are in progress. The results may be improved if some 
other feedback-controlled measures (such as real-time modification of the MPEG 
encoding rate) were used in conjunction with the current ones. 
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Abstract. A high priority real-time connection is denied admission to an ATM 
network if sufficient bandwidth is not available along all suitable paths through 
the network. Bandwidth reallocation and dynamic active channel re-routing are 
techniques that can be used to admit high priority real-time connections where 
traditional CAC techniques would deny admission. A node can select lower 
priority channels, reallocate their bandwidth to the new higher priority 
connection being admitted, and reroute those channels so that their QoS 
requirements and transmission deadlines can still be satisfied. At call 
admission time, one or more backup channels are established for those primary 
channels that are likely to be selected as victims for bandwidth reallocation. 
This allows reroutes to be handled quickly and efficiently. When reroutes 
occur, the protocols ensure that the transmitted data are received on time and in 
sequence, which is essential for real-time communications. SANRoP, a cell 
based discrete event simulator, was developed to simulate these protocols in an 
ATM network in order to determine how well they perform. 



1 Introduction 

We examine the problem of call admission of prioritized real-time communications 
channels with call establishment deadlines in an ATM network. Channel re-routing 
protocols and the technique of bandwidth reallocation are tools for admitting high 
priority real-time connections when traditional call admission control (CAC) 
algorithms would deny admission due to insufficient resources. Since we do not want 
to renege on prior commitments, two main problems must be solved. First, how can 
the network admit high priority calls when there is insufficient bandwidth available? 
Reallocating bandwidth from the lower priority applications can resolve this problem. 
However, this brings up the second problem. Channels whose bandwidth has been 
reallocated elsewhere will likely need to be rerouted so that they can also meet their 
own QoS requirements and their transmission deadlines. However, this is not a trivial 
problem. 

When a call is admitted to a typical ATM network, a path/route is selected through 
the network, and bandwidth/resources sufficient to meet the Quality of Service (QoS) 
requirements and the traffic contract are allocated/reserved along that path for the 
duration of the call. Once allocated to a channel, other channels can’t use its 
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bandwidth until it is released by closing the connection. This means that other calls 
may not be admissible due to a lack of available resources. Thus, when the network 
load is near capacity, such as with peak usage times associated with video on demand 
services, interactive network games, multi-party teleconferencing, etc., then most of 
the network’s resources will likely tied up and unavailable for high priority real-time 
connection requests. 

In recent work, the issue of channel rerouting has been examined for the purpose of 
ensuring that the network is fault tolerant in the face of system component failures. A 
reactive method [3], several /onvarcl recovery methods [4, 5, 12, 19], static routing 
methods using local detours [2, 6, 21, 22], and end-to-end detouring methods [1, 10, 
11, 13, 17], have been studied. These methods are designed for fault recovery and are 
unsuitable when QoS guarantees and timing constraints associated with real-time 
channels must be continuously satisfied, even during channel rerouting [15, 16]. 
However, these techniques can be adapted, with proper route selection for the primary 
and backup channels to support dynamic channel rerouting. 

In this paper, an end-to-end detouring approach and two local detouring 
approaches (node-level detours and port-level detours) are described as methods for 
rerouting channels. The detours in our methods are chosen to have as minimal an 
impact as possible on the path lengths/costs of the connections after reestablishment, 
thus eliminating the problems with meeting the connection's QoS requirements and 
transmission deadlines (if any). In [15], some initial results of a simulation study 
were presented which show that node-level detours can be successfully used to admit 
high priority calls in a network experiencing relatively moderate load conditions. In 
[16], details regarding the algorithms used for establishing and activating node-level 
detours were presented in more detail. For this paper, a new simulation study was 
performed to see how these new protocols performed under moderate loads, when 
both node-level and port-level detouring were employed. 

The rest of this paper is organized as follows. Section 2 presents the technique of 
bandwidth reallocation for releasing bandwidth for use by higher priority channels. 
Section 3 presents the algorithms for rerouting channels selected to have their 
bandwidth reallocated. Section 4 presents methods for route and detour selection to 
support these techniques while minimizing resource allocation / consumption. 
Section 5 presents the results of the simulation study on the above three mentioned 
detouring approaches. Section 6 concludes with a discussion of future work and a 
summary of the work presented in this paper. 



2 Bandwidth Reallocation 

When a high priority real-time channel is requested and insufficient bandwidth exists 
on all paths through the network capable of supporting the channel, then the network 
must either obtain the necessary bandwidth for the channel or deny the request for 
access to the network. Since denying the request is not a desirable result for high 
priority connections, some means of obtaining bandwidth must be employed. The 
naive solutions for admitting high priority real-time channels by statically pre- 
reserving a percentage of each link’s capacity, or by renegotiating the QoS guarantees 
associated with low priority channels, are often not feasible. 
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The solution adopted here is to use bandwidth reallocation [15, 16]. Bandwidth 
can be stolen from existing connections starting with the lowest priority connections 
first. Existing lower priority connections are selected and their bandwidth is 
reassigned to the higher priority channel by the call admission control algorithm until 
enough bandwidth is freed to establish the high priority connection. 

Sufficient time must be allowed for: 1) the network to activate (and maybe even 
establish) the low priority channel’s backup channel for this node’s local detour, and 
2) the deprecated sub-channel just bypassed to be drained of its residual cell streams, 
before the bandwidth can be transferred. The time necessary to accomplish these two 
tasks will cause the CAC algorithm, trying to admit the high priority channel, to incur 
additional delay. The techniques described in this paper, and in [15, 16], attempt to 
minimize this delay so that call setup deadlines can still be met. 



3 Channel Rerouting 

Rerouting a channel depends upon whether the connection is still active after its 
bandwidth has been reallocated. Inactive channels do not need to be rerouted 
immediately, but may be rerouted eventually after sufficient time has elapsed. Active 
channels, however, must be immediately rerouted so that their QoS guarantees and 
transmission deadlines can still be met. 

Rerouting an active channel is more problematic than that of an inactive channel 
because of the delay requirements associated with them. Most likely there is 
insufficient time to allow the channel to drain before rerouting it. The reallocation 
point sends signals notifying the detour points of the channel reroute. Once the detour 
is activated, the deprecated part of the channel must be allowed to drain its stream of 
cells before being torn down. Unfortunately, this introduces a problem. It is now the 
case that two parts of the same channel are traversing partially independent paths 
through the network, and could end up being delivered out of order at the destination. 

Our method uses delimiters to allow the detour points to distinguish how the cells 
arrived, i.e. whether they traversed the original route, or the detoured route. The 
detour points buffer cells arriving from the detoured route until the end of the cell 
stream arrives from the deprecated subpath, i.e., the channel end-stream identifier cell 
is seen. With proper path layout, the size of the buffer allocated to this channel will 
not be very large, thus having a negligible impact upon network resources. 



4 Channel Admission 

The selection of the routes for the primary path and all of its detours depends upon the 
priority of the channel being established, and whether the end-to-end transmission 
deadlines of the primary subpath, can be satisfied by some detour. Low priority 
channels can tolerate increases in path length/cost which have a negligible impact on 
how the network continues to meet the QoS guarantees. Thus, it’s primary path can 
be a shortest “cost” path through the network, and the detours can be selected to 
bypass the primary subpaths accordingly. For really low priority channels, it may be 
possible to use a single alternate route, i.e. an end-to-end detour. 
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For high priority real-time channels, a different approach is necessary. These types 
of channels don’t tolerate increases in path length/cost very well, if the network still 
wants to meet their QoS guarantees and transmission deadlines (very small increases 
may be tolerated with changes in hop by hop deadlines along the local detour, which 
is an exception handled by the algorithm). Thus, if these channels are potential 
victims, then a sub-optimal path through the network should be chosen so that the 
shortest “cost” path can be used as an end-to-end detour in order to maintain, or even 
reduce, the path length/cost when it replaces the respective primary path. 



4.1 Local Detour Path Selection 

In addition to selecting a primary path at call admission time, a set of alternative 
backup path candidates for use as local detours are chosen for use in the event of 
bandwidth reallocation. An attempt is made to find multiple detours for every 
primary subpath of the primary path so that the best option can be selected for use at 
reroute time in order to continue to meet the QoS requirements and any transmission 
deadlines of the connection being rerouted. These local detours are chosen so that 
they don't use any nodes or links that are involved in the primary path, except for the 
two detour end-point candidates for which the path detour is being selected. 

In [15, 16] algorithm are presented for finding A Local Detour Set which is a set of 
local detours, each of which routes around the same specific node of the primary path. 
Once a Local Detour Set is computed for each node of the primary path, the “best” 
candidate is selected from this set as the detour around that node, should it be needed 
by a reroute operation. For efficiency, each node in the primary path can calculate its 
own local detour set and select its best detour during the call admission process. 



4.2 Primary Path Selection 

It is desirable to select a path which does not deviate too much from the cost of the 
optimal/shortest path, as this can have a negative impact on the network by overly 
wasting allocated resources, and may even be unsuitable for the channel (i.e. the end- 
to-end delay may be too great to meet the channel's requirements). The path selected 
may be an optimal path, or it may be a sub-optimal path. 

A simple path selection algorithm [15, 16], is used to find acceptable sub-optimal 
paths through the network. It finds the shortest cost path through the network, and 
disallows it. The algorithm then finds another shortest cost path through the network 
that detours around the disallowed path. The algorithm is a very simple heuristic that 
finds a path through the network, which is not the shortest cost path. This heuristic 
algorithm is used because finding an optimal set of paths through the network subject 
to a set of multiple constraints is known to be a NP-complete problem. If more 
extensive parameters need to be considered in choosing a primary path, then an 
algorithms such as the ones in [7, 14, 18, 20], which take into consideration 
throughput, delay, and error rate, can be used to find routes through the network. 
Although the shortest cost path is explicitly disallowed as the primary path, it may be 
one of the detours around a reallocation points. 

If it is impossible or prohibitively expensive to select primary and backup paths 
that satisfy the delay conditions needed for rerouting, or if the end-to-end delay of the 
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primary path or the backup paths is too great to be usable by the connection, then this 
channel is assigned a channel which is not re-routable. 



4.3 Pre-allocation of Backup Channels 

Since no a priori knowledge is available regarding the sequence of call setup requests 
and the associated bandwidth requirements, it is difficult to apply a deterministic 
technique for creating backup channels for primary subpath detours. A naive 
approach would be to create backup channels for all channels admitted to the 
network. However, this would require the network to allocate twice the resources, or 
more, for a channel than if no backup channels had been created at all. 

To prevent explosive over allocation of resources for this technique, backup 
channels will only be preallocated, i.e. established in advance, for those channels 
most likely to be selected as victims for bandwidth reallocation. In addition, some 
backup channels can be multiplexed across their overlapping subpaths, and if the 
network determines that too many backup channels are being established, then the 
network may opt to establish fewer backup channels which cover more potential 
reallocation points each. However, each detour must still meet the QoS guarantees 
and transmission deadlines of each of the subpaths it is intended to replace. 

The technique applied here is to allocate backup channels for a small percentage of 
the primary channels at a given node. The algorithm in Figure 2 is used to determine 
whether or not a channel needs pre-allocated backup channels for its local detours. 
The algorithm determines the set, LP, of channels that would need pre-allocated 
backup channels, selected from lowest priority up, until enough bandwidth, the 
minimum backup channel bandwidth threshold (MBBT), has been selected so that 
new high priority calls will not be delayed excessively during call setup. For each 
node, the set LP is recomputed each time a new channel is admitted. If the new 
channel ends up in the node’s set LP, then the node will cause a backup channel to be 
pre-allocated, i.e. established, along the nodes detour. The MBBT can be 
dynamically adjusted by the network based upon its load and needs so that enough 
resources can be maintained in reserve for admitting high priority connections to the 
network. 



4.4 Port Level Detours 

In [25, 26], all channel reroutes were performed using backup channels that 
completely bypass the reallocation point node. In this paper, a finer grained approach 
is also studied. When a potential reallocation point notices that it is also a detour 
point for another node, and that a backup channel (of which it is an end-point) was 
pre-allocated for the other node, then it considers the possibility of using port-level 
detours whenever possible, for rerouting the channel locally. Thus, when a 
reallocation point needs to select a victim channel for bandwidth reallocation and 
channel rerouting, it checks to see if it is the end-point of a backup channel 
established for the channel at another node in the network. If so, and the backup 
channel does not flow through the output port(s) which are commandeering 
bandwidth from this channel, then the backup channel flowing out of this node can be 
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used, rather than using the backup channel specifically pre-allocated for this node, 
which bypasses it altogether. This works as long as the path length of the backup 
channel for which this node is an end-point does not significantly exceed the path 
length from this reallocation point to either of its detour points. This will reduce 
some of the complexity of the channel reroute, since only one reroute signal will need 
to be sent, and might reduce the channel reroute time since the reroute signal may not 
have to travel as far to reach the other end-point of the backup channel. Not every 



Algorithm: CAC-Prealloc-Backup-Channels 

Input: lc_^, NBC, PBC, BC 
Output: NBC, PBC, BC 

// Ic^ is the channel being established at this node 

// NBC is the set of channels for which no backup channel is preallocated 
// PBC is the set of channels for which a backup channel might be preallocated 
// BC is the set of channels for which a backup channel is definitely preallocated 
If lc„, can be dropped or suspended indefinitely, then 

Add channel to the non-backed-up channel set NBC 
Do not add channel to the potential-backup set PBC and 
Do not add channel to the backedup channel set BC, 

Do not create backup channels for the local detour around this node 

return 

Elself channel lc„, can tolerate path computation and channel setup delays, then 
Add channel to the non-backed-up channel set NBC 
Do not add channel to the potential-backup set PBC and 
Do not add channel to the backedup channel set BC, 

Do not create backup channels for the local detour around this node 
return 
Else 

Add channel to the potential backup set PBC 

Let set LP contain channels Ic. which are the n lowest priority channels in PBC where: 

k 

n > k such that E Ic • > MBBT (the Minimum Bandwidth Backup Threshold) 

i=l * 

End-If 

If Ic^ e LP Then // i.e., channel Ic^ is one of the channels to create a backup channel for 
build instructions for detour points and pj, the end points of the local detour around 
this node, to establish the backup channel along the specified local detour 
insert these instructions into the connection setup message 

forward the connection setup message (it will be processed by one of the detour points) 

wait for the call accept signal to come back 

when the call accept signal comes back, do the following: 

Eor all channels i LP, such that Ic^ s PBC Do 

If lC|^ has a pre-allocated backup channel along its local detour around this node Then 
release the backup channel since it is no longer needed 
End-If 
End-For 
Else 

we do not need to create a backup channel along the local detour around this node 

End-If 



Fig. I. Backup Channel Preallocation Setup Algorithm 
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channel reroute will be able to take advantage of port-level rerouting. However, 
whenever possible, the network will use this finer grained rerouting technique of 
bypassing the output port, rather than the entire reallocation point, since it will be 
quicker than its node-level counterpart. 



5 Simulation Results 

A cell-based discrete event simulator, called SANRoP (Simulator for ATM Network 
Routing Protocols), was developed that is capable of simulating the process of 
sending cells through an ATM switching network. Discrete events are executed to 
transfer cells and signals among the components of an ATM network. SANRoP uses 
the algorithms and procedures described here, and in [15, 16], for computing a 
channel’s primary path, primary subpaths, and local detours for backup channels. 

The results of the simulation are presented in table 1. Two types of networks were 
simulated: mesh grid networks, and general topology networks. The mesh grid 
network consisted of 24 switches and 12 hosts. The general topology network 
consisted of 10 switches and 6 hosts. The network load varied from light (i.e., 0 - 
25% overall average network utilization) to moderate (i.e., 25 - 260% overall average 
network utilization) during the times at which bandwidth reallocation and channel 
rerouting occurred. The cost adjustments shown represent the effects of using a 
detour that increases (or decreases in the case of sub-optimal primary paths with end- 
to-end detours) the overall cost of the path used by the channel. Small changes in cost 
represent a change of approximately 0 - 7.5% of the original primary path. Medium 
changes in cost represent a change of approximately 7.5 - 15% of the original primary 
path. Large changes in cost represent a change of approximately 15 - 30% of the 
original primary path. Very large changes in cost represent a change of 
approximately 30 - 60% of the original primary path. As the path lengths increased 
beyond the medium category the amount of buffering required by the switches during 
channel rerouting increased only when sub-optimal routing with end-to-end detours 
was used. However, in all cases, the maximum buffer size remained less than 20 
cells, which can be easily handled with sufficient memory in the switches. In all 
simulation runs, no cells missed their transmission deadlines during channel reroute 
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times, and the network fulfilled its QoS guarantees, even when bandwidth reallocation 
and channel rerouting occurred due to admission of a high priority channel for which 
one or more nodes had insufficient resources available. 

The channel reroute times varied from 167 |j,sec to 2,684 |j,sec. This is well within 
the tolerances allowed with call establishment deadlines on the order of 2-5 sec. 
When an optimal path was used for the primary channel, the amount of buffering 
required remained fairly minimal, which is expected when detours that extend the cost 
of the channel are utilized. Also, the time required, and therefore the delay 
encountered by high priority channels requesting access to the network, actually goes 
down when node-level detours are used instead of end-to-end detours. Thus, when a 
victim is itself a real-time channel (albeit one with a relatively lower priority), it is 
likely to be rerouted fairly quickly and efficiently. 

When a sub-optimal path was used for the high priority channels, the amount of 
buffering required depended upon several factors. First, as the change in the cost of 
the channel is increased (i.e., the path cost is reduced by larger amounts, since these 
channels are for real-time channels), the maximum required buffer space increased. 
However, as the link utilizations increased along the routes for the backup channels, 
the cells traversing the shorter route were sometimes slowed down, thus reducing the 
amount of buffering required by a minor amount. However, buffer sizes never grew 
very large, and in all cases, the channel reroute times were still quite acceptable. 

The use of port level detouring allowed switches to select victims which did not 
have pre-allocated backup channels bypassing that switch. The switch simply used a 
backup channel allocated to bypass some other node in the network. However, this 
allowed nodes to continue to select victims, without needing to establish a new 
backup channel, under certain conditions when there were insufficient victims locally. 
This provided additional benefit to the nodes in about 10% of the cases where 
bandwidth reallocation and channel rerouting were required. 

These results demonstrate that the QoS requirements of a channel can still be met 
and the ordering of the ATM cells to the application layer preserved when channel 
rerouting occurs, even when moderate traffic loads are present on the network. These 
results were expected due to the nature of the path selection algorithms. This means 
that, even for real-time channels, the cell stream integrity was maintained, i.e. the 
amount of delay due to route detouring was negligible and therefore insufficient to 
cause loss of QoS, which is an important requirement for real-time communications. 



6 Conclusion 

In this paper, bandwidth reallocation was examined as a tool for admission of high 
priority real-time channels in an ATM network which is experiencing partial 
overload. In a traditional ATM network, high priority calls may not be admissible 
due to a lack of available resources, since admitting them could potentially cause the 
QoS requirements of previously admitted calls to be violated. In order to admit them 
to the network, it is necessary to make sufficient resources available for the high 
priority channels while still meeting the QoS guarantees of the already admitted 
connections. When a high priority application requests a channel, bandwidth assigned 
to a lower priority channel is reallocated to the higher priority channel being 
established. Since the lower priority channel no longer has any bandwidth, it was 
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necessary to reroute the channel along a local detour around the node that caused the 
reallocation so that its own QoS requirements could still be met. 

Routing issues for supporting end-to-end, local, and port-level detours were 
examined, and methods were given so that the paths used by a detour could meet the 
channel transmission deadlines. Optimal and sub-optimal routes were chosen for the 
primary path of the channel depending upon the priority level of the connection. 
Detours were selected which allowed the channel to continue to meet its QoS 
requirements when it was rerouted around a node that reallocated its bandwidth 
elsewhere. Methods were presented to ensure that the cells of a channel are delivered 
in sequence, and on time, to the destination, even though the cell stream was split in 
the middle and routed across two separate paths through the network. A simulation 
was constructed to demonstrate that the use of these detours resulted in the QoS 
guarantees and transmission deadlines still being met for rerouted real-time channels. 

Future work will involve looking into the possibilities of multiplexing the backup 
channels for different primary channels along common subpaths so that fewer 
network resources, i.e. bandwidth, needs to be allocated to the backup channels 
themselves. This would increase the number of connections that can be accepted 
simultaneously by the network. The methods described in [8, 9] may be able to be 
adapted for use in the simulator in order to multiplex the backup channels assigned to 
be shared by the local detours into a single backup channel which requires less 
network resources. In addition, it would be nice to examine alternative methods for 
calculating a channel’s primary path, and detour subpaths, which may lead to even 
more flexibility in selecting detours around a node which must reallocate the 
bandwidth of one channel to another. Finally, simulations need to be run for large 
heavily loaded networks with wildly varying traffic patterns as a function of time in 
order to see what happens in the worst case when a set of channel reroutes are 
performed simultaneously. 
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Abstract. The Internet explosion impels the extensive demands for dis- 
tributed multimedia presentations (DMPs), which provide multiple users 
with QoS-controlled multimedia services under multicast communica- 
tions, such as media distribution and virtual classroom. In this paper, we 

(i) identify the primary issues for ensuring smooth multiple-stream mul- 
timedia presentations in the multicast environment, (ii) propose a formal 
temporal dehnition to specify related attributes of a multimedia presen- 
tation, and (iii) propose a temporal control mechanism to achieve the 
temporal synchronization of multicast multimedia presentations. Based 
on the proposed temporal definition and control mechanism, a multime- 
dia middleware for multicasting multiple streams is developed. The mul- 
timedia middleware is named TVMS (The Viusal Multicast Systems), 
which (i) provides a flexible authoring tool to allow users to author a 
multiple-stream multimedia presentation in a multicast environment and 

(ii) achieves smooth multimedia presentations. 



1 Introduction 

With the advances of computer and communication technologies, distributed 
multimedia presentations (DMPs), e.g., video distribution and virtual classroom, 
become popular applications piBiini. DMPs can be characterized by the inte- 
grated multicast communications and presentation of multiple continuous and 
static media. Based on multicast communications, the server transmits data to 
multiple recipients simultaneously, each of who has the same multicast address 
0. A continuous medium, such as video or audio, is a time-dependent medium 
that possesses temporal relations between media units |2|. A static medium, 
such as text or still image, is a time-independent medium that has no tempo- 
ral relation between media units, but may have inter-media temporal relations 
with other media streams. Since the transmission delay is undeterministic in 
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a distributed environment, temporal anomalies always exist during a multime- 
dia presentation El. Thus, one of the important issues in implementing DMPs 
is to resolve the multiple-stream multimedia synchronization problem that is 
associated with the multicast delivery E). 

The goal of multiple-stream multimedia synchronization is to keep temporal 
relations of media streams as much as possible during the presentation. Multiple 
streams mean that media streams are retrieved from their own media bases and 
are transmitted via independent network channels with different network QoS 
(Quality-of-Service) requirements I12I17I . Therefore, the delay variance of multi- 
cast communications and the respective features of multiple streams complicate 
the multimedia synchronization reams complicate the multimedia synchroniza- 
tion uncu. 

Multimedia synchronization is one of the important issues for deriving smooth 
multimedia presentations in the distributed environment mm- Two types 
of temporal synchronization are intra-medium synchronization and inter-media 
synchronization El. Intra-medium synchronization ensures intra-medium tem- 
poral relation of a medium stream and compensates for jitter, which is the asyn- 
chronous anomaly between consecutive media units of a medium stream m- 
Two widely adopted intra-medium synchronization schemes are blocking and 
non-blocking synchronization schemes uni- ( 1 ) The blocking scheme: If an ex- 
pected medium unit does not arrive on time, the presentation process suspends 
its presentation until the expected medium unit arrives. (2) The non-blocking 
scheme: If an expected medium unit does not arrive on time, the presenta- 
tion process immediately re-presents the most recently received medium unit. 
Inter-media synchronization ensures inter-media temporal relations among re- 
lated media streams and compensates for skew, which is the time difference 
between related media streams m The master-stream control scheme can be 
adopted to achieve inter-media synchronization. The master stream dominates 
the commencement and finish of a presentation at each synchronization point. 
(1) If a slave stream finishes its presentation earlier than the master stream at 
a synchronization point, the slave stream has to block its presentation until the 
master stream finishes its presentation. (2) When the master stream finishes its 
presentation at a synchronization point, the late slave streams have to discard 
media units to keep pace with the master stream. 

A DMP essentially consists of two components: an authoring system and 
a temporal control system m- An authoring system is the generating mech- 
anism of behavior specifications. A behavior specification is composed of me- 
dia attributes of related streams, including involved media streams, temporal 
and spatial attributes of each medium stream, and the temporal relationship 
between related media streams. An authoring system allows a user to specify 
behavior specifications of the corresponding multimedia presentation. The tem- 
poral control system is the synchronization and presentation mechanism that 
is derived from function specifications. Function specifications describe how to 
achieve temporal relations existing in the corresponding behavior specifications. 
In this paper, to resolve multimedia synchronization in the multicast presen- 
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tation environment, we propose a formal definition approach to have concise 
behavior specifications. Based on the proposed formal definition, a multicast 
multimedia middleware named TVMS (The Viusal Multicast Systems) is devel- 
oped to achieve smooth pre-orchestrated DMPs. 

The rest of the paper is organized as follows. Section 2 describes the pro- 
posed multicast multimedia network architecture. Section 3 describes the formal 
definition of the temporal relationships in a multimedia presentation. Section 4 
describes the system architecture of the temporal control mechanism. Section 5 
concludes this paper. 



2 Network Architecture 

The proposed multicast multimedia network, which is called Multicast MultiMe- 
dia Communication Network {M^CN), is a two level hierarchical architecture 
that spans a distributed environment 0 The M^CN consists of a WAN and a 
lot of LANs that are attached with the WAN. Each LAN is composed of a local 
Multicast MultiMedia Server (M^ server) and clients. An server transmits 
media units to hosts via LAN or (and) WAN. Clients of a presentation group 
present the same multimedia resource simultaneously and maybe scattered over 
different LANs. 

In M^CN, the concept of a “virtual server” is adopted. A virtual server 
receives media units from the “physical server” , which owns the presentation re- 
source, and re-transmits them to end clients. The virtual server is a local server 
of a LAN and compensates for WAN’s anomalies by means of pre-depositing 
some media units and having corresponding synchronization schemes. The con- 
cept of virtual servers can simplify the overhead of synchronization control in 
clients because WAN’s asynchronous anomalies are compensated for and media 
streams are synchronized at virtual servers. Clients become simpler and low-end, 
e.g. a Set-Top-Box, a diskless networking PC, or a networking TV. 

The multicast presentation system is composed of the physical server system 
(PSS), the virtual server system (VSS) and the client system (CS). The PSS 
retrieves media units from media bases, and multicasts media units to virtual 
servers according to the information schedule file. Through the PSS, a system 
manager can specify a communication configuration that contains the multicast 
group address, communication socket ports, and media files. The VSS receives 
media units from the PSS and stores media units in media buffers temporarily. 
According to the presentation schedule, the VSS re-multicasts media units to 
clients with the synchronization control. With the help of the VSS, temporal 
anomalies induced by the WAN’s transmission can be compensated. The CS 
receives media units and achieves a smooth presentation adopting the synchro- 
nization and presentation control. 
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3 The Formal Definition of Multimedia Presentations 

A formal definition mechanism should be proposed to specify related temporal 
attributes in a multimedia presentation. An authoring system can adopt the 
formal definition mechanism to have users to specify what their multimedia pre- 
sentations look like. In this Section, we have the formal definition of a multimedia 
presentation. 

A multimedia presentation MP is defined as MP = {MS , TS} , where MS = 
{msi, ms2, ■ ■ ■ , msn} represents the set of n involved media streams and TS = 
{p-stagei, p-stage2, ■ ■ ■ , p-stagck} represents the presentation temporal schedule. 
A presentation temporal schedule consists of many presentation stages {p-stagei, 
where i=l..k). Each p-stagCi contains Xi presentation sections {p-sectioni, where 
i=l..m), where l<Xi <m and X\+X2+. ■ ■ +Xk=m. In other words, 

p-stagei = {p-sectioni, p-section2, p-sectiona;^} , 

p-stagG2 = |p-sectiona;j+i , p-sectiona;^+2 , ..., p-sectiona,j+a,2} , 



A presentation stage is a semantic cut of a multimedia presentation. For 
example, let the multimedia presentation be CNN news broadcast about the 
chess race between world chess champion Gary Kasparov and supercomputer 
Deep Blue. Figure [U depicts the presentation as follows. (1) The news reporter 
reports the news about a chess race between Gary Kasparov and Deep Blue. 
The news reporter’s audio, Gary Kasparov’s video, and the related news text 
are presented. (2) Gary Kasparov thinks and moves a piece. Then, the video 
of chess explanation and the text about the introduction of Gary Kasparov are 
presented. (3) An agent moves the piece according to Deep Blue’s determination. 
The background music and some auxiliary texts are always presented. Thus, the 
presentation of Figured is divided into three stages. 

Media streams involved in a presentation stage are a subset of MS defined 
as A-MS (p-stage^) = {AAmsi), ..., A,(ms„)}. In stage z, if medium stream 
mSx has media units presented, Ai(mSx)=^^ l<x<n] if medium stream msx has 
no media units presented, Ai(mSx)=Q^ l<a;<n. 

A presentation section represents that some media objects have temporal 
relationships, e.g. the start relation. Thus, one medium object’s presentation in 
a section depends on another medium object’s presentation status. For example, 
the text of news and the video of Gary Kasparov appear when a specific audio 
is presented. As depicted in Figure d the presentation of a specific segment A1 
starts the presentations of VI and Tl. 

It is not necessary that a medium stream always has media units presented 
throughout an entire section. In Figured the text stream has nothing to present 
between t2 and t3. The time period that a medium stream has nothing to present 
is called an idle segment. Idle segments can be defined as i-segment (p-stage^ , 
msj) = [tx , ty] , where Ai(msj)=l and msj has nothing to present from time 
point tx to time point ty. For example, the text medium has an idle segment <5i 
from t2 to ^3 in section 2 of stage 1. 
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Fig. 1. The display time-bar chart of a presentation temporal schedule. 



Temporal relationships of two media objects can be formally defined as 
a;(ti)© y(t2) [ta , t4] , where x and y denote media objects, ti is the display 
time period of x, t2 is the display time period of y, © denotes the type of the 
temporal relationship, and and <4 describe the front, tail, or gap time interval 
for the corresponding temporal relationship. (Parameters and are optional. 
If there is no front, tail, or gap time period, nothing has to be specified.) The 
temporal relations denoted by © include the ‘equal’, ‘start’, ‘before’, ‘meet’, 
‘during’, ‘overlap’, ‘finish’, and their reversed relations. 

Based on the proposed formal definition, temporal relationships for involved 
media objects in a presentation stage is defined as TR(p-stagei) = 

3 > • • • 3 } 3 

where media objects Op.^ and Og.^. , l<j<I, has some temporal relationship in 
stage i. Each presentation stage is associated with a master stream. The master 
stream of a stage is formally defined as MCp-stage^) = {msy}, where l<y<n 
and Ai(mSy)=l. Figure El is the formal definitions of the illustrated multimedia 
presentation depicted in Figure Dl 

4 Software Architecture 

of the Temporal Control Mechanism 

This Section describes the system architecture and prototype implementation of 
TVMS, which is composed the physical server system (PSS), the virtual server 
system (VSS), and the client system (CS). 



564 H.-Y. Kung and C.-M. Huang 



MP = {MS, TS}, 

MS = (video, audio, text), 

TS = (stage-1, stage-2, stage-3}, 
stage- 1 = ( section- 1, section-2, section-3}, 
stage-2 = (section-4, section-5}, 
stage-3 = (section-6}, 

A-MS(stage-l) = (1, 1, 1}, 

A-MS(stage-2) = (1, 1, 1}, 

A-MS(stage-3) = (1, 1, 1}, 
i-segment(stage-l, video) = [t ^ , tj], 
i-segment(stage-l, video) = [t 3 , t^], 
i-segment(stage-l, text) = [tg , tj], 
i-segment(stage-l, text) = [t2 > tj], 
i-segment(stage-2, audio) = [t ^ , tg], 
i-segment(stage-2, video) = [t g , t^], 
i-segment(stage-2, text) = [t^ , tg], 
i-segment(stage-2, text) = [t g , t^], 
i-segment(stage-3, video) = [t ^ , tg], 
i-segment(stage-3, video) = [t ^ , t^^], 

M(stage 1) = M(stage 3) = (audio}, 

M(stage 2) — {video}, 

TR(stage-l) = (Al(t^- t(,) during VlUg- tj)[tj- t^, t^- t3], Al(t^- t^) during TIU^- t^)[t^- t^, t^- t^], 
Al(t^-t^j) finish T2(t4-t3)[t3-t„]}, 

TR(stage-2) = (V2(tg- t^) overlap A2(t^- tg)[tg- t^, t^- tg], V2(tg- t^) finish T3(tg- tg)[tg- tJ}, 
TR(stage-3) = (A3(tjo- t,) during V3(tg- tg)[tg- t^, t^^- t^], A3(t t^) equal T4(tjo-tg)}. 



Fig. 2. Formal definitions of the the illustrated multimedia presentation depicted in 
Figure D 



4.1 Physical Server System (PSS) 

Three main components of the PSS are Synchronizer, Media Sender, and Con- 
tinuous Media Reader, which are depicted in Figure El 

- Synchronizer. Synchronizer is responsible for the coarse-grain synchroniza- 
tion to achieve section and stage synchronization based on the parallel-last 
scheme. With the parallel-last scheme, each medium stream can be com- 
pletely transmitted regardless of media processing anomalies. 

- Media Sender. Media Sender is responsible for retrieving media units from 
media buffers, and then transmits them to networks. Moreover, the media 
sender should cooperate with Synchronizer to achieve inter-media synchro- 
nization. 

- Continuous Media Reader. Continuous Media Reader is responsible for re- 
trieving continuous media units from the media base and then puts them 
into media buffers. The purpose of media buffers is to compensate the ir- 
regular media retrieval time from the media base. For static media, since 
(1) the volume of media units is much less than that of continuous and (2) 
temporal requirement is not critical, static media units is directly retrieved 
from media bases by the corresponding Media Senders. 
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LiDP multicas* 



UDP multicas 



RMTP multicast 



RMTP multicast 




Fig. 3. Architecture of the physical server system. 



Rate control is used to keep a continuous and steady multicasting for con- 
tinuous media streams. For example, assume that the default transmission rate 
of a video stream is 15 frames-per-second (fps). Hence, Media Reader has to 
retrieve a video frame for every 1/15 second from the video base and put the 
video frame into the video buffer. However, since a regular operating system, 
e.g., Unix and Windows NT, is a time-sharing and a multiple-process system. It 
is difficult to exactly control what time to retrieve a video frame and what time 
to accurately multicast a video frame. Due to the inaccuracy execution-time. 
Media Reader can not retrieve media units from the media base with a constant 
retrieving rate. As a result, the media buffer may become empty because Media 
Sender multicasts media units with the default transmitting rate. Under the sit- 
uation of buffer empty. Media Sender has to suspend its work and then waits for 
Media Reader to retrieve media units into buffer. The suspending time induces 
a discontinuous transmission. 



4.2 Virtual Server System (VSS) 

Three main components of VSS are Media Receiver, Media Transmitter, and 
Synchronizer, which are depicted in Figure El These three components of VSS 
are similar to those in PSS. That is, functions of Media Receiver (Media Trans- 
mitter) in VSS are similar to those of Media Reader (Media Sender) in PSS. For 
simplicity, we only describe main features and functionality of VSS. 

- Media Receiver. Because of the characteristics of static media. Static Media 
Receiver can not lose any medium unit. Thus, RMTP is used between the 
virtual servers and the physical server for static media. With the reliability 
function of RMTP, each static media unit can be received. 
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Fig. 4. Architecture of the virtual server system. 



- Media Transmitter. Media Transmitter retrieves one medium unit from the 
media buffer and then re-multicasts the medium unit to LAN’s clients ac- 
cording to the schedule description file. Two types of Media Transmitter 
are Continuous Media Transmitter and Static Media Transmitter, which are 
responsible for multicasting continuous and static media units respectively. 
RMTP is also used to transmit static media for the reason of reliability. 

- Synchronizer. Synchronizer is also responsible for coarse-grain synchroniza- 
tion and to coordinate Transmitters’ behavior. 

In VSS, the master-medium-based synchronization control combining with the 
adopted presentation scheme, which is either the content-oriented or the time- 
oriented scheme, is executed at each synchronization point. 

4.3 Client System (CS) 

CS starts the presentation after some commencement control is done, i.e., after 
pre-depositing some media units in the buffer to compensate LAN’s anomalies. 
Three main components of CS are Media Gather, Media Presenter, and Synchro- 
nizer, which are depicted in Figure El In CS, an additional function of Media 
Presenter is to achieve the fine-grain synchronization between continuous me- 
dia under the condition that a tight temporal relation between audio and video 
streams, e.g., lip synchronization, is needed. 

The master Media Presenter controls fine-grain synchronization by issuing 
fine-grain control messages to all of the other non-master Media Presenters at 
the fine synchronization point. The slower Media Presenters have to keep pace 
with the fastest Presenter by discarding some media units to reach fine-grain 
synchronization points. Synchronizer controls the coarse-grain synchronization 
between two consecutive sections. Furthermore, at the beginning of each section. 
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LAN 





Fig. 5. Architecture of the client 



Synchronizer has to assign one Presenter as the master and issues the fine-grain 
synchronization flag to the master Presenter in order to notify whether the fine- 
grain synchronization is needed or not in this section. 



5 Conclusion 

This paper describes the main issues of designing temporal control mechanisms 
for multicasting multiple-stream multimedia presentations. A formal definition is 
proposed for the behavior specification, which concisely specifies the seven tem- 
poral relationships of a DMP behavior. Based on the formal definition, we have 
developed the TVMS. TVMS is based on the proposed M^CN architecture and 
contains synchronization/presentation mechanisms to achieve multiple-stream 
multimedia temporal control. TVMS also provides generic supports for media 
specifications and multicast environment setup. System developers can incor- 
porate TVMS to develop DMPs more efficiently and effectively in a multicast 
environment. 
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Abstract. In this paper we study the impact of the medium access 
control (MAC) layer and the routing layer on the performance of a multi- 
hop wireless network. At the medium access control layer, we argue that 
the notion of per-node fairness employed by the IEEE 802.11 standard 
is not suitable for a multi-hop wireless network where flows traverse 
multiple hops. We propose a new MAC protocol that supports prioritized 
per-node fairness and significantly improves performance in terms of both 
throughput and fairness. At the routing layer, we show that load balanced 
routing improves performance regardless of the nature of the underlying 
MAC protocol. Moreover, we show that an ideal load balanced routing 
protocol should take into account both the hop counts and the capacities 
when computing the optimal path. We propose a new routing protocol 
that improves performance over the conventional shortest-widest path 
routing. 



1 Introduction 

Ad-hoc networks are multi-hop wireless networks that lack the services of an 
established backbone infrastructure. They are typically formed by a collection 
of mobile stations cooperatively establishing a multi-hop wireless network. In 
recent years, numerous approaches have been proposed for routing [6,7,11-13], 
and medium access control (MAC) [1,4,10] in ad-hoc networks. While a majority 
of the routing protocols are similar to shortest path routing in that they use 
hop count as the optimization metric, the MAC schemes are mainly based on 
the CSMA/CA protocol. In this paper, we revisit the throughput and fairness 
properties of shortest path routing and CSMA/CA based MAC protocols in ad- 
hoc networks. We show through simulations that the end-to-end throughput and 
fairness properties of these routing and medium access control schemes are poor. 
We present simple algorithms at the two layers that significantly improve the 
throughput and fairness. 

We make two key contributions in this paper: (i) We demonstrate that ex- 
isting MAC protocols for ad-hoc networks (e.g. IEEE 802.11 [2]), based on the 
per-node fairness paradigm of CSMA/CA, do not provide end-to-end throughput 
fairness. We argue for a departure from the notion of per-node fairness to that 
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of per-flow fairness. We then present a new MAC protoeol that has a per-flow 
notion of fairness for channel access and achieves improved end-to-end through- 
put fairness, (ii) We show that load balanced routing not only can improve the 
end-to-end throughput observed by flows, but also can have a positive impact 
on the fairness observed by flows. We argue that a conventional load balanced 
scheme such as shortest-widest path algorithm will not provide optimal results 
in ad-hoc networks. Finally, we present a new load balanced routing algorithm 
that is suitable for the target environment. 

The rest of the paper is organized as follow: Section 2 presents the protocols 
and algorithms that we use in the rest of the paper. Section 3 describes the sim- 
ulation model including the topology and traffic generation. Section 4 presents 
the simulation results. Section 5 discusses some issues and concludes the paper. 

2 Algorithms 

2.1 Medium Access Control 

We use the IEEE 802.11 MAC protocol as the reference protocol. In order to 
alleviate any unfairness that the implementation of IEEE 802.11 protocol might 
contribute [8], we have implemented an ideal, per-node-fairness based MAC pro- 
tocol (ILP) similar to the one presented in [9]. The ILP algorithm attempts 
to provide ideal, per-node fairness, and given a certain fairness level tries to 
maximize the throughput. Finally, we use an ideal per-flow-fairness based MAC 
protocol (IFP) that incorporates priorities in the ILP algorithm, where the pri- 
ority of a node is set proportional to the number of flows traversing the node. 
Figure Q] presents a pseudo-code for the IFP protocol. Section 4 will present the 
simulation results comparing the three protocols. 

2.2 Routing 

We use a simple shortest path routing algorithm as the reference protocol. Ini- 
tially, we show that the shortest-widest path algorithm is not suited to the ad- 
hoc network environment. For the rest of the simulations, we adopt a new load 
balanced routing algorithm that takes into account both the capacity (width) 
and the hop count (length) along a path. We assign a weight w to each “link” 
in the network, where w is proportional to the amount of contention at that 
link due to existing flows in the network. The shortest- widest path algorithm 
would then translate into finding the path with the minimum maximum-weight 
(MMW), while the new algorithm would involve finding the path with the min- 
imum aggregate- weight (MAW). Figure El presents the algorithm for the MAW 
protocol. Note that a variation of Dijkstra’s algorithm (minimum maximum- 
weight instead of minimum aggregate- weight) can be used to achieve MMW 
routing with the same algorithm as shown in Figure El We show that the MAW 
algorithm performs better than the MMW algorithm in terms of the mean and 
variance of the end-to-end throughput. Finally, we demonstrate that the load 
balanced algorithm improves the fairness irrespective of whether the underlying 
MAC protocol is fair or unfair. 
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Input: 

Set F of source-destination pairs (si,di) 

Vector Degree 

where Degree{si) is the degree of node Si 
Vector NumberO f Flows 

where NumberO f Flows(si) is the number of flows traversing node Si 
Vector Priority 

where Priority(si) is the priority associated with node Si 
Vector Allocation 

where Allocation{si) is the number of time slots allocated to node Si 

(Both Priority{k) and Allocation(k) are set to 0 for all k during network 
initialization. The values carry over across iterations of the algorithm 
presented below.) 

Output: 

Set T of source-destination pairs allowed to transmit in the current time slot 
Updated vector Priority 
Updated vector Allocation 

Algorithm: 

Initialize set T to an empty set 
While F is not empty 
Find (si, di) 

such that Si has the maximum value in the lexicographic 
ordering of {Priority's j), —Degree{sj)) for all Sj in F 
Remove (si,di) from F 
Add {si, di) to T 



For each pair {sj, dj) in F 

If node Sj is adjacent to node di 
Remove pair (sj,dj) from F 
If node dj is adjacent to node Si 
Remove pair (sj,dj) from F 

For each pair (si,di) in T 

Increment Allocation{si) by 1 

Priority(si) < Allocation{si) / NumberO f Flows(si) 



Fig. 1. Ideal Per-Flow-Fairness Based MAC Protocol (IFP) 



3 Simulation Model 

We use the ns2 network simulator for our simulations [3] . While we have used 
topologies of varying sizes (50, 100, and 200 nodes respectively) for our sim- 
ulations, we present only the results for the 100 node topology in this paper. 
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Input: 

Set F of source-destination pairs {si,di) 

Output: 

Set R of routes for all source-destinations pairs in F 

Algorithm: 

Initialize R to an empty set 
Initialize weight{sj) to 1 for all Sj 

For each pair (si, di) in F 

Use Dijkstra’s shortest path algorithm to obtain route ri 
For each node m on route ri except for di 
Increment weight(m) by 1 

Increment weight{q) by 1 for all q that is adjacent to m 
Insert in R 



Fig. 2. Load Balanced Routing 



The nodes are uniformly distributed in a 1500m x 1500m grid. The simulation 
scenarios presented in this paper do not have any mobility. We will revisit the 
issue of mobility later in Section 5. The data rate of the underlying channel is 
set to 2 Mbps, and the transmission range is set to 250m. The traffic in the 
network consists of 25 bi-directional TCP flows between 25 pairs of randomly 
(uniformly distributed) chosen sources and destinations. The simulations are run 
for a period of 100 seconds. Each data point is an average over 10 simulations run 
with different seeds for the random distribution. We use the mean and the devi- 
ation as the metrics to compare the throughput and fairness respectively. Unless 
otherwise specified, the routing protocol used is shortest path routing (SPR). 

4 Simulations 

4.1 MAC and Fairness 

In Figure 0 we present the normalized deviation of the end-to-end throughput 
for the three MAC protocols. We define normalized deviation for a scenario as 
the standard deviation normalized by the mean throughput achieved for that 
scenario. As seen, IEEE 802.11 exhibits a high degree of unfairness. Note that 
in addition to the reasons given shortly, IEEE 802.11 has been shown to exhibit 
unfairness even when providing per-node fairness, and this accounts for the dif- 
ference in its performance when compared to the ILP algorithm. The difference 
in performance between ILP and IFP can be briefly explained as follows: In 
ILP, nodes are given “equal” access to the channel irrespective of the number of 
flows traversing them. This results in lowered throughput for flows that traverse 
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Fig. 3. MAC and Fairness 



nodes handling more number of flows. However, in IFP, nodes are given access 
to the channel in proportion to the number of flows for which they act as relays 
(routers). Hence, flows are not penalized for traversing “congested” nodes. This 
results in the improved fairness for IFP. 

4.2 Load Balanced Routing 

In Figure 0 we present a comparison between the mean throughput achieved 
by the MMW (minimum maximum- weight, or shortest- widest path), and the 
MAW (minimum aggregate- weight) algorithms respectively. As observed, the 
MAW algorithm offers significantly more throughput than the MMW algorithm 
irrespective of the MAC protocol used. The reason behind the improvement is 
the fact that the network is moderately to heavily loaded (16 kbps to 256 kbps), 
and in such scenarios the longer hop counts (8.86 hops) of the MMW algorithm 
results in the network being overloaded sooner than in the case of the MAW 
algorithm (5.02 hops). Briefly, the larger number of hop counts results in more 
usage of the underlying network capacity: 

Usage ~ NumberO f Flows * AverageH opCount * AverageFlowRate 

As long as the total usage is less than the network capacity [5], the impact of 
larger hop counts is not noticed. However, when the network is heavily loaded, 
it is more likely that the larger hop count will result in the network becoming 
overloaded sooner, resulting in poor performance. 

While the MAW algorithm is better in terms of the mean throughput, it can 
be seen from Figure 0that the algorithm performs better in terms of the fairness 
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Fig. 4. Load Balanced Routing: Mean 



also. Recall that the normalized deviation, and not the absolute deviation, is used 
as the fairness index. 

4.3 Routing and Fairness 

In Figure 0 we present the impact of the routing algorithm on the end-to-end 
throughput fairness. We again use the normalized deviation as the metric for 
fairness. When the underlying MAC is unfair, it is obvious that having a load 
balanced routing algorithm will improve fairness. This is because of the fact 
that load balancing reduces the average degree of multiplexing of flows on a 
single link, and hence bounds the unfairness introduced by the MAC protocol. 
This improvement in fairness is evident in Figure However, it is interesting 
to note that load balancing improves fairness even when the underlying MAC is 
fair with respect to flows. Briefly, the reasons for this improvement are twofold: 
(i) The transport protocol used is TCP, and TCP is unfair to flows with larger 
RTTs. Hence, when flows with different RTTs share a single link, the mechanics 
of TCP will result in the flow with the smaller RTT getting a greater portion of 
the link capacity. Load balanced routing reduces the overlapping of flow paths, 
and hence reduces such effects, (ii) Although the underlying MAC protocol is 
fair, the variance in the degree of path overlapping (due to the existence of flows 
that have no or minimal link sharing along their paths, along-with flows that 
share links with a large number of flows) will induce unfairness in the network. 
Load balanced routing reduces the variance in the degree of path overlapping, 
and hence improves fairness. 
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Fig. 5. Load Balanced Routing: Normalized Deviation 



4.4 Routing and Throughput Distribution 

In our simulations, we observe that shortest path routing occasionally exhibits 
higher average throughput than load balanced routing. While superficially this 
indicates better performance, a closer look at the average throughput distri- 
bution between the different flows reveal that shortest path routing, although 
exhibiting higher average throughput, punishes a large number of flows (very low 
throughput in relation to the mean) in favor of a few flows that enjoy throughputs 
significantly higher than the mean throughput. Figure 0 shows the distribution 
of the number of flows observing different end-to-end throughputs. The distri- 
bution is a consolidation of the results of 10 simulations, and hence has a total 
of 500 flows. As seen in the figure, the peak of the distribution for load balanced 
routing is closer to the mean than that of shortest path routing. Moreover, load 
balanced routing has a consistently better distribution curve about the mean 
throughput value. Finally, it can be seen that the peak of the distribution for 
the shortest path algorithm at the right end of the graph (high throughput) is 
higher than that of load balanced routing, substantiating our earlier claims that 
SPR greatly favors a few flows. 

5 Issues and Summary 

5.1 Issues 

(i) Mobility: Due to lack of space, we do not consider mobility in the results 
presented thus far. However, the following observation can be made about the 
probable impact of mobility: While the shortest path and the MAW algorithms 
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Fig. 6. Routing and Fairness 



will suffer throughput degradation (possibly by the same amount) due to mo- 
bility induced losses, MMW can be expected to suffer significantly more losses. 
This is because of the fact that MMW paths, by virtue of their longer hop counts 
are more likely to break because of link failures, (ii) Distributed Algorithms: The 
new algorithms presented in this paper are centralized in nature. The scope of 
the paper is limited to highlighting the drawbacks of existing protocols and sug- 
gesting better approaches, and hence we do not present distributed versions of 
the algorithms. However, we believe that developing distributed versions of the 
algorithms introduced will not be a difficult task, and we hope to develop the 
distributed algorithms as part of our future work. 

5.2 Summary 

In this paper we have studied the performance of existing MAC and routing 
schemes in terms of their fairness and throughput characteristics. While we agree 
that the per-node fairness model adopted for packet cellular networks is apt for 
that environment, we argue that sueh a model is not suitable for an ad-hoc 
network where nodes cooperatively act as routers or relays for flows belonging 
to other nodes in the network. We propose a new MAC protocol that supports 
a per-fiow fairness model, and in the process achieves significantly better end- 
to-end throughput fairness. At the routing layer, we show that a load balanced 
routing scheme that takes into account both the capacity of paths and their hop 
counts is more suitable for ad-hoc networks than a conventional shortest-widest 
approach. We demonstrate through simulations that the new routing algorithm 
does better than shortest path routing both in terms of throughput distribution 
and fairness. 
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Fig. 7. Routing and Throughput Distribution 
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Abstract. In this paper, we propose an approach to solve the power al- 
location issues in a DS-CDMA cellular system using genetic algorithms 
(GAs). The transmitter power control has proven to be an efficient 
method to control cochannel interference in cellular PCS, increase band- 
width utilization and balance the wireless services. Most of the previous 
studies have assumed that the transmitter power level is controlled in a 
constant domain under the assumption of uniform distribution of users 
in the coverage area or in a continuous domain. In this paper, the optimal 
Centralized Power Control (CPC) vector is characterized and its opti- 
mal solution for CPC is presented using GAs in a large scale DS-CDMA 
cellular system. 



1 Introduction 

Transmitter power control is an effective way of increasing the system capacity 
and transmission quality in cellular wireless systems. In DS-CDMA wireless com- 
munication systems, transmitter power is regulated to be provided to each user 
randomly distributed in coverage area (service zone), which is considered as the 
interference seen by other users. There has been significant work on the channel 
assignment and power allocation strategy used in these systems, such previous 
works as Refs. [1] [2] which have focused on CPC. They only investigated simple 
case because of the difficulties of computation and search of optimal solutions. 

In this paper, we first propose an approach to solve the power allocation 
issues in a DS-CDMA cellular system using genetic algorithms (GAs) to obtain 
global optimal solutions, which are powerful and broadly applicable stochastic 
search and optimization techniques based on the principles from the evolution 
theory [2] [3] [4]. Most of the previous studies [5] [6] assumed that the transmitter 

* This work is supported in part by the Grants-in-Aid for Science Research 
No. 12680432 and by Research for the Future Program at Japan Society for the 
Promotion of Science 
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power level is controlled in a constant domain under the assumption of uniform 
distribution of users in the coverage area or in a continuous domain. In this 
paper, the optimal Centralized Power Control (CPC) scheme which helps in 
the design of distributed power control schemes that are easy to implement is 
characterized and its optimal solution for CPC is presented using GAs in a 
typical case[l] and a large scale of DS-CDMA cellular system under the realistic 
test that means random allocation of the active users in the entire coverage area. 
In other aspects, we tighten the balance of the signal-to-interference ratios at 
the receiver and the convergence rate [3] [4] [5] by GAs, that is used to measure 
the rapidity of capturing the optimal solution. 



2 Transmitter Power Allocation Problem 



In the cellular system, we assume N users and M base stations. All users use the 
common radio channel in the DS-CDMA system. Let pi denote the transmitter 

power of user i so that P=[pi,p 2 , Pn] denotes the transmitter power vector of 

the DS-CDMA cellular system. The corresponding received signal power of user i 
at base station k is L{i, k)pi where L{i, k) denotes the gain for user i to base sta- 
tion k. The interference seen by user i at base station k is j^i ^)Pj- K 

is assumed that the system is interference limited and therefore noise is ignored. 
A mobile user i uses the base station k which is closest to it for communication. 
All the L{j, k)s are greater than zero. The SIR of mobile user i at its base station 
k is then written by [7] [8] 



SIRi = 



PiL{i,k) 



Pi 






for 1 < i < N 



j,k 



( 1 ) 



where 



G. 






L(j,k) 

L{i,k) 

0 



for j i 
for j = i 



(2) 



and a is the voice activity factor. In order to achieve the balance of SIR, the 
optimized issue of the same SIR for all the users in the system is expressed as 



SIR„„f = max min SIRi 

^ P>0 l<i<AT 

SIRt^f = min max SIRi (3) 

^ P>0 l<i<N 

where, SIRf^^ and SIR^^^ are the maximum of minimum value of SIRi and 
minimum of maximum value of SIRi,i = 1 N, respectively. Due to the the- 

orem and lemma of R.Vijayan and J.Zender[l], let us define G as an x 
matrix that has Gj^k as its elements. The matrix G has a few important prop- 
erties which are described as follows. 
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Fig. 1. Power allocation issue in CDMA system 



1. G is an irreducible nonnegative matrix 

2. There exists a unique SIR* given by 

SIR* = SIR-p, = SIRtpt (4) 

So we have the same SIR* that is achievable by all users. In a large scale of 
DS-CDMA system with random allocation of users in its coverage area, it is 
not easy to find the optimized solutions. In this work, we adopt GAs to search 
them as fast as possible termed convergence rate for operators, helping in the 
implementation of distributed power control schemes. 

The total power allocation used here consists of a string structure where the 
total length is the sum of the total power required by the total number of users 
in the system as shown in Fig.l. The components of the algorithm are examined 
in the following subsections. 



2.1 Objective Function 

The objective function will essentially determine the survival of each chromosome 
by providing a measure of its relative fitness. The primary goal of any approach 
to solve the power control problem is to decrease the multi-user interference, 
balance the services and achieve the optimized power allocation, while satisfying 
the required quality of signal transmission. By assigning the power to each user 
to satisfy the same SIR for all users, the objective function which encompasses 
all of the considerations is described as 

V = min \SIR+pt - SIR~pt\ (5) 



2.2 Crossover 

After reproduction, crossover proceeds with a probability, Pc- This operator takes 
two randomly chosen parent individuals as input and combines them to generate 
two children. This combination is achieved by choosing two crossing points in 
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the strings of the parent and then exchanging the allelic values between these 
two points. According to the fundamentals of CPC, preliminary simulation re- 
sults show that with the simple crossover operator, a significant number of the 
configurations will be generated. In order to greatly speed up the convergence 
rate and the computation, the evolution is then proceeded via the partially 
matched (PMX)[2] crossover operator. At the same time, we introduce the dis- 
carding strategy of the invalid parents in the process of PMX crossover. On 
the other hand, the crossover points and number can be selected automatically 
according to the constraints of the power allocation. Furthermore, our solution 
representation allows us to further reduce the search space, it is very available 
for investigating the practical large scale of CDMA cellular wireless system and 
can search the optimized solution as soon as possible. We termed our crossover 
operator as adaptive partially matched crossover as APMX. 

In order to achieve APMX easily, each individual is represented by the real 
number vector and use nonlinear arithmetic crossover to process them. We also 
create two First-In First-Out (FIFO) stacks to store the individuals. One is 
following the increasing step based on the amount of SIRi,i = 1...N of the 
individuals. Another one is following the decreasing step in the purpose of easy 
crossover and speeding up the convergence rate. When we select the parents, 
following the above principle of the two stacks. In this case, given two parents 
A and B, one is the individual with the maximum of SIR and the another 
one with the minimum of SIR. The crossover is performed by first generating 
cross points by adopting the gradually deceasing length of crossover during the 
entire crossover procedure. At the end of the GA’s operators, we will rank the 
individuals by FIFO stacks and discard some individuals with the very significant 
changes between SIR^p^. — SIR~pf. 

Because of a unique solution for our issue [1] and to speed up the convergence 
rate, we design the APMX algorithm in which nonlinear arithmetic functions 
are used in our crossover operator. The arithmetic crossover is defined as the 
combination of two chromosomes, Pi{t) in t-th generation with the maximum 
SIR'^p^ and Pj{t) in t-th generation with the minimum SIR~p^ as follows: 

p^{t+l) =Pt{t) - Xpjit) 

Pj (i + 1) = P3 (^) + (^) (6) 



where, for obtaining A, the three types of nonlinear arithmetic exponent functions 
are introduced in the investigation as 



A = 



1 

P + pt 



( 7 ) 



where, P, and p are control parameters , which will determine the convergence 
rate of the GAs. 



2.3 Mutation 

The mutation operator provides opportunities for long jumps from local minima 
because the crossover operator may lead to falling into a local minimum of the 
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fitness function as the generated children tend to be very similar to their parents. 
A low level of mutation serves to prevent any one element in the chromosome 
from remaining fixed to a single value in the entire population. On the other 
hand, a high level of mutation will essentially result in a random search. To 
maintain a balance between such extremes a valid value for pm is 0.01 [3] [4]. 



2.4 Selection 



The selection operator produces individuals with higher potential to be optimal 
solutions. The selection operator is very important as it must usually accomplish 
a trade-off between the two opposing and undesirable tendencies. Thus, individ- 
uals with higher fitness have more chances to reproduce. In the other hand, if 
only the fittest individuals are selected for generating new generation, it may 
result in a quick convergence rate to local optimal solutions. Therefore in FIFO 
stack, we adopted two stacks with stack depth, N. The same parents, A and B 
will be selected two chances for reproducing new children. From the two time 
number of children, it is more available for us to select fitter new generation 
and at first, discard some worst individuals. With this procedure, the fitter in- 
dividuals have a higher probability of being chosen than weaker counterparts. 
According to the simple selection based on a roulette wheel, the probability of 
any individual to be selected from the population may be defined as 



ip{i) = 1 - 



\SIRtpt - SIR-pt\ 



(8) 



2.5 Termination Criteria 

In this paper, to achieve the balance of the signal-to-interference ratios at the 
receivers and speed up the convergence rate adopting GAs, in which the indi- 
vidual strings in the current population are processed by the genetic operators 
described above subsections to form new generation. Whether the best candidate 
in the generation does not violate any of the problem’s constraints, the search 
may terminate. In each iteration step, the search can also be terminated when 
there are no significant changes in the difference between the two successive 
generations depicted as 

V{t) = mm \SIRtpt{t) - SIR~pt{t)\ (9) 

Then, the search of GAs will be implemented under the following conditions 
satisfied as 



v{t) < ^ 



(10) 



where 6 is the termination parameter. 
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3 Simulation Results 

3.1 A Typical Example 

Following Ref. [1], the example was illustrated by CPC scheme in a smaller scale 
with only three mobiles using the same channel with link gains given by G matrix 
as follows: 

/ 1.0 xlO""* 4.82253 X 10"® 3.57346 X 10-1° \ 

1.52416 X 10"® 6.25 x 10"° 3.50128 x 10"® 

\7.67336 X 10-1® 2.44141 x 10-® 1.23457 x 10-° / 

Power control simulation is done for each individual, which is assumed for its 
target as maximum as possible. Figs. 2 and 3 show the results of the situations 
with and without FIFO. Genetic algorithm with FIFO and with a random initial 
value of power allocation to each user prove to be superior to the case without 
FIFO in 50 generations because in Fig. 2 to obtain 6 = 0.01 over 200 generations 
will be necessary, on the other hand, the convergence rate will be in 160th gener- 
ation with FIFO. We also see in Fig. 2 it takes a longer execution time to obtain 
better results and Fig. 3 makes more rapid progress in convergence rate than 
Fig. 2. On the other aspect about the nonlinear arithmetic exponent function, 
when P = 1 and /x = 1, the re-allocated power between two chromosomes will 
be largest which lead up to the best convergence rate. 

For this typical example, if we use equal transmitter powers, the three users’ 
sir’s are as 42.85 dB, 25.23 dB and 16.90 dB, respectively. After the simulation 
by GAs, we also obtain the three equal results of SIR’s as 24.74 dB as the 
same results depicted in Ref.[l]. We also picks the unique solution of the power 




Generations 



Fig. 2. SIR^p-i and SIR^p^ versus the generation without FIFO (5 = O.OldB, a = 1, 
N = 3) 





Dynamic Allocation of Transmitter Power in a DS-CDMA Cellular System 



585 




Generations 



Fig. 3. SIR^p^ and SIR^p^ versus the generation with FIFO (<5 = O.OldS, a = 1, 
N = 3) 



allocation problem up as 1.79 x 10-3, 8.67 X 10-2 and 5.11 X 10 respectively. 
We see an improvement of 7.8 dB in minimum SIR by CPC strategy. 

3.2 A Large Scale Cellular Wireless System with CDMA 

In our simulation environment, we consider a general multi-cell CDMA cellular 
system on a rectangular grid. In this system, there are nine base stations with 
{x, y) coordinates (lOOOt-hlOOO, lOOOj-hlOOO) for 0 < i,0 < j. The x and y 
coordinates of each user are independent uniformly distributed random variables 
between 0-60km. Figure 4 shows the positions of base stations and the example 
of randomly distributed users in the system when we set N^, = 30 users/cell. 

Figure 5 shows the users’ SIR’s versus the user number when we set the 
equal power to each user. We see the the rapid variations of each user’s SIR 
are occurred that means some users have the better transmitting quality, and 
some have worse quality, which could not satisfy our purpose for balancing the 
services, especially in an integrated wireless cellular system. 

We see the GAs with FIFO has the better astringency to obtain the unique 
optimal solution. In the investigation of the large scale cellular system, if the 
FIFO strategy is not adopted, it takes a much longer execution time for CPU. 
For the real time problem, this strategy will lose its purpose of solving CPC 
problem in a large scale cellular system. On the other hand, it might lead up 
to no astringency by GAs because these methods are blind search techniques 
for finding optimal solutions in the entire solution space. Then we use the FIFO 
strategy in the investigation of the large scale cellular system. Figure 6 shows 
the convergence rate of the users’ SIR’s with maximum value of SIR and with 
minimum value of SIR. We see that SIR* reaches the target optimal value in 
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Fig. 4. Simulation environment for the number of active users, Nc and nine cellular 
wireless system {Nc = 30 users/cell) 
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Fig. 5. SIR [ dB ] versus the user number without power allocation (pi = p 2 = pjv = 

1, a = 0.375, N = 270) 



over 50 generations as /3 = 1 and p = 1. As f3 increases to /? = 10, it takes the 
better results as soon as possible near 15 generations. It performs much better 
in this simulation runs. 

According to the simulation results, the final unique optimal solution, that 
is the best SIR*, takes as -11.812542 dB whatever the nonlinear arithmetic 
exponent functions are used in GAs. In order to achieve this purpose, the power 
allocation graph by CPC has been obtained and shown in Fig. 7 for the system 
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Generatons 



Fie. 6. SIR+ and SIR-. versus the generation with FIFO for the large cellular 
system (5 = O.ldS, a = 0.375, N=270) 



power allocation picture 




Fig. 7. Allocation of transmitted power for Fig. 11 in the entire coverage area 



structure shown in Fig. 4. From this graph, we see the bigger amount of power 
will be allocated to the users located at the boundaries among the cells. The user 
required the biggest power is located at coordinates approximate (0, 40000) and 
the user required the smallest power is located at approximate (10000, 30000) 
around the 4th BS. We also see that the power allocation graph of the users 
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located in 3th and 6th cells slowly varies and with a little amount power required 
because of with smaller user density in this area shown in Fig. 4. 

4 Conclusions 

According to the simulation results, it showed that genetic algorithms are robust 
for the optimal power allocation. 

In this investigation, to speed up the convergence rate and filter out the 
illegal solutions, we introduced the nonlinear arithmetic exponent function and 
FIFO strategy which may improve the convergence rate of genetic algorithms. 
Then we effectively simulated the centralized power control in a real, large scale 
cellular wireless system and obtained the better results. 

The main benefit of these simulation results is that they provide an estimate 
of CPC and as the basics for the design of DPC in the system. Furthermore, they 
provide the reference results when we design the burst admission algorithms or 
the system with varying processing gains. 
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Abstract. The last few years have seen a great number of new net- 
work applications. Many of these applications are characterized by their 
multipoint features and by their needs of high bandwidth. To allow the 
deployment of these applications at a large scale in the Internet without 
running the risk of network implosion, an adapted congestion control 
mechanism has to be used. In this paper we propose to use the Fair 
Scheduler paradigm for end-to-end congestion control for the design of 
a congestion control algorithm to be used with the Reliable Multicast 
Transport Protocol RMTP. The FS paradigm supposes that all the net- 
work routers use a fair scheduler queueing mechanism. This hypotheses 
allowed us to have worst case bounds on the queueing delay that a block 
of RMTP data packets experiences between the sender and any receiver 
in the multicast group. Based on these bounds the sender will regulate 
its transmission rate in order to avoid a network congestion. Two dif- 
ferent versions of this algorithm are presented in detail. The coexistence 
RMTP/TCP will be the issue of a suite of tests that intend to evaluate 
the performance of the new algorithm over a Fair Queueing (FQ) sched- 
uled network with the presence of TCP flows sharing the same network 
resource as the RMTP flow. The same tests will be rerun over a FIFO 
scheduled network in order to have an idea about the advantages of using 
the scheduling discipline as a network support for congestion control in 
the Internet. 



1 Motivations and Related Work 

The permanent upcoming of new applications that do not use the unicast Trans- 
port Control Protocol TCP [7011711 1^ as the underlying transport protocol risks 
to destabilize the TCP/IP Internet. These new applications are characterized 
by their multi-point features like audio/video-conferencing, multi-party games 
and file distribution. Faced with the fact that TCP is not appropriate for multi- 
point transport and that multicast applications have different needs a general 
consensus in the Internet community seems to be that there isn’t a single mul- 
ticast transport protocol able to satisfy the needs of all multicast applications. 
In consequence a lot of multicast transport protocols are specified, some are for 
real time applications like RTP/RTCP jSlIZ], others for bulk data transfer like 
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MFTP 0, etc. Besides the big number of Internet users (300 million users are 
expected in 2001) the upcoming of these new multicast applications increases 
Internet traffic exponentially. To prevent the explosion of the Internet it becomes 
imperative to determine some rules of coexistence between unicast and multicast 
protocols. This paper addresses mainly this issue. We will focus on the transport 
layer protocols where a real conflict arises between these protocols due to the 
Internet resources limitation and the absence of deterministic rules that allow 
unicast and multicast control functions to share these resources in a fair man- 
ner. In the current Internet these control functions are located at the end hosts, 
the network is still passive and does not play an active role in the control of 
data flows. The TCP-friendly paradigm [3| was introduced to devise congestion 
control protocols compatible to TCP. However it seems to be unable to satisfy 
the needs of new upcoming applications, like multipoint applications. A TCP- 
friendly source-based congestion control scheme for multicast flows has to adapt 
its sending rate to the worst receiver (in terms of loss). Numerous multicast flows 
would suffer from the application of this scheme, including RMTP. Therefore, 
new research efforts are oriented toward new schemes for end-to-end congestion 
control. These schemes intend to satisfy the needs of more applications without 
destabilizing the Internet. The Fair Scheduler (FS) paradigm ^ is one of these 
schemes. 

In a previous publication Q have presented a simplified model of RMTP 
(Reliable Multicast Transport Protocol) that we have simulated using the Net- 
work Simulator NS. In our model RMTP uses a TCP-like end-to-end congestion 
control mechanism. Based on this model we have tested the performance of the 
protocol over a FIFO scheduled network. The simulation results show the bad 
performance of the protocol when used to multicast data to receivers with very 
heterogeneous criteria like bandwidth, loss rate and delay, and that it is up to 
the worst subgroup in terms of these criteria to determine the performance of 
the protocol. The simulation shows as well the impact of what we have called 
the “loss misunderstanding” problem on the performance of the protocol. This 
problem appears when the sender considers packets as lost that in reality are 
still in transit to the receivers and not yet acknowledged when the new sending 
interval starts, which prevents the sender from advancing its window. This phe- 
nomenon is mainly due to the fact that over a FIFO scheduler network it is not 
possible to have worst case bounds on delay. This makes the adjustment of the 
transmission rate in function of the group’s physical conditions very difficult. 

The organization of this paper is as follows: In Section Q we present briefly 
the FS paradigm and we propose a new congestion control algorithm for RMTP 
based on this paradigm. The performance evaluation of the new RMTP con- 
gestion control algorithm will be presented in details in Section El and finally 
Section El concludes this paper. 
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2 FS Based Congestion Control Algorithm for RMTP 

In general, support from the network can help congestion control. However this 
support, that can go from simple buffer management to active networking, is still 
not completely trusted by the Internet community. This distrust is mainly due to 
the aim of not violating the end-to-end principle of the Internet and to the lack 
of a clear idea about what kind of router support to use. One of the simplest way 
to use network support is to change the scheduling discipline inside the routers. 
In ^ , Biersack and Legout propose to deploy the PGPS-lik^ scheduling inside 
routers in a new congestion control paradigm called FS paradigm. The main 
assumptions of the FS paradigm are: A fair scheduler network, i. e. a network 
where every router implements a fair scheduler, with end users that are assumed 
to be selfish and non collaborative. A user acts selfish if he only tries to maximize 
his own satisfaction without taking into account the other users. When users do 
not use the same congestion control mechanism, they are considered as non- 
cooperative. In this section we will design a congestion control algorithm for 
RMTP based on the FS paradigm. 



2.1 Work Context 

We consider an RMTP virtual tree, as shown in Figure Q Receivers are grouped 
into subtrees, called local groups. For each subtree, one receiver is chosen to be a 
local group master (called designated receiver or DR. Each DR is responsible for 
processing receivers’ status messages and for achieving local recovery in its local 
subtree. The sender sends data via multicast to all the receivers in the group. 
Receivers in turn send their status messages periodically to the associated DR. 
Each DR plays the role of the sender in its subtree and retransmits lost data in its 
subtree. The DRs send their status messages periodically up to the sender, which 
in turn processes these messages and performs global recovery. Our RMTP model 
uses a hybrid window-based and rate-based congestion control mechanism. The 
sender sends data via multicast in regular intervals with a period of interval-. 
The maximum number of data that can be sent in one interval is VFs_. So during 
one interval the maximum transmission rate is MaxdTr = Ws-/ interval-. 



2.2 Worst Case Bounds on Delay for RMTP Session 

To determine the value of MaxdTr we will proceed as follows: For a given value of 
Ws- we will compute, thanks to the FS paradigm’s first assumption, the value 
of Dmax{^,k) that represents the maximum queueing delay the Ws- packets 
experience on a network path between the node 1 and the node K. In general a 
GP^ server m that serves N sessions on a link is characterized by N positive 
real numbers, . . . , These numbers denote the relative amount of 

^ Packet General Processor Sharing pirnii^in] 

^ General Processor Sharing 
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Fig. 1. RMTP Architecture 



service to each session in the sense that if t) is defined as the amount of 

session i traffic served by server m during an interval [r, t] , then 



snr,t) 



> = 1 , 2 , 
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for any session i that is continuously backlogged in the interval [r, t] . (A session 
is backlogged at time t if a positive amount of that session’s traffic is queued at 
time t.) Thus d is satisfied with equality for two sessions i and j that are both 
backlogged during the interval [r, t] . 

From Equation d whenever a session i is backlogged it is guaranteed a mini- 
mum service rate of g™, where r™ is the rate of the link represented by node m. 
gf* is called the session i backlog clearing rate, since a session i backlog of size 
q is served in at most ^ time units. In consequence is computed 

using Equation d 
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The propagation delay Dp{i,j) between two consecutive nodes i and j on 
the path from node 1 to node K is considered to be constant. So the overall 
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propagation delay from node 1 to node K is Dprop{l, k) = X)n=i + 1) 
and the maximum overall delay is D = Dprop{i~, k) + Dmax{^i k). 

2.3 Optimistic RMTP Congestion Control Algorithm 

In this case the DR will acknowledge positively the packets it receives during 
an interval without waiting for the reception of the receivers’ status messages. 
In other words we consider that if a packet is well received by the DR then 
the DR will be able to retransmit this packet to receivers in its local group. 
However, to assume this responsibility the DR has to keep a copy of every 
received packet in its local buffer until this packet is acknowledged by all the 
receivers in the local group. The sending interval is considered to be composed 
of two consecutive phases: a sending phase and a listening phase with periods T1 
and T2 respectively. T1 is equal maximum delay requested to receive the Ws- 
packets by any of the DRs, i. e. T\ = Dtotai{Sender, Dr), while T2 represents 
the time the sender spent listening to the status messages of the DRs. 

When using the optimistic algorithm the sending interval has to be long 
enough in order to allow the DR to retransmit lost packets in its local group 
before the next interval starts. In general it might be necessary that a DR makes 
more than one retransmission cycle in order to assume a fully reliable transfer 
to all receivers in its local group. If we consider that the number of these re- 
transmission cycles is N, then the value of interval, can be computed using the 
following formula: 



N 

Dtotai{Dr, Receiver) + '^[Dhack{Receiver, Dr) + Diost,i{Dr, Receiver)] 

i=l 

< Dhack{Dr, Sender) + T2 (4) 

where Dtotai{Dr, Receiver) is the maximum delay the Ws. packets experience 
between a DR and any of the receivers in the associated local group, 
Dhack{Receiver ,Dr) is the maximum delay a HACK experiences between any 
receiver and the associated DR, Diost,i{Dr , Receiver) is delay that is required to 
retransmit the lost packets sent by the sender, and D lost, i{Dr, Receiver) is the 
maximum delay required to retransmit the newly lost packets retransmitted by 
the DR in the {i— 1)*^ retransmission loop. We define a “retransmission loop” as 
the associated events (Receivers send HACKS, DR retransmits lost packets). N is 
the number of retransmission loops required to have zero loss. In consequence the 
value of interval, has to satisfy the relation interval. > Dtotai{Sender, Dr)+T2. 
The pre-determination of the value of T2, and in consequence the value of 
interval-, will limit the choice of N to an upper bound. When the loss rate 
increases, this upper bound might not be sufficient in order to have enough re- 
transmission loops to achieve full reliability, and this is exactly the inconvenience 
of determining N in function of interval- On the other hand if interval, is de- 
termined in function of the required N, the delay of the session will increase and 
the throughput will decrease. It is quite difficult to choose the optimal value of 
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the tuple {interval-, N) since this will be extremely dependant of the loss rate 
that receivers suffer from. 

2.4 Pessimistic RMTP Congestion Control Algorithm 

In this case the DR has to wait for the acknowledgments of the receivers in 
its local group. Here it is very important that the DR performs at least one 
retransmission loop before it aggregates the status messages of receivers in its 
local group with its own status message and sends a single status message back 
to the sender. This is illustrated in the following relation: 

D totalis ender, Receiver) + 2 • Dhack{Receiver, Dr)+ 

Diost,i{Dr, Receiver) + Dhack{DR, Sender) < interval- (5) 

3 Performance Evaluation 

of the New RMTP Congestion Control Algorithm 

In this section we are going to evaluate the performance of the optimistic and 
the pessimistic congestion control algorithms presented previously in two cases: 
The first case is that of a single RMTP session with the absence of any other 
unicast or multicast flows on the network. The second case is that of a single 
RMTP multicast session that has to share the network resources with multiple 
TCP sessions. The network topology in the two cases will be one sender, one 
DR and a set of 10 receivers associated to the DR, as shown in Figure El The 
performance metrics are: The number of retransmissions carried out by the DR, 
the mean average delay per packet for all receivers and the average throughput 
of all receivers. In all the tests the value of WS- will be set to 20 pkt/s. The 
simulations have all been done using the network simulator NS. The simulation 
time of all the tests is set to 100s. A suite of tests that intends to evaluate the 
performance of an isolated RMTP flow over a Fair Scheduler network in function 
of the error rate attached to each receiver was carried firstly using the optimistic 
algorithm (for N = 1 and N = 2) and secondly using the pessimistic algorithm. 
The results of these tests are shown in Figures 130 and0 

Another suite of tests that intends to evaluate the performance of the RMTP 
flow in function of the number of TCP flows that shares the same Fair Scheduler 
network resources has been carried out. The first TCP flow is between the same 
node as the RMTP sender and the first receiver in the multicast group, and 
the second is between the same RMTP sender and the second receiver in the 
multicast group, etc. This suite of tests has been carried out using the optimistic 
algorithm {N = 2) as well as the pessimistic algorithm. The results of these tests 
are shown in Figures E] El andlSl 

The previous results show that the throughput of the RMTP flow is generally 
superior using the optimistic algorithm compared to the pessimistic one. In con- 
trast the delay difference between the two algorithms is very small, which means 
that the DR fulfills its role in local recovery efficiently and the need of global 
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Sender 




Fig. 2. Network topology 



recovery in the case of the pessimistic algorithm is small. The simulation results 
show that the throughput of the RMTP flow decreases progressively with the 
number of TCP flows that share the same network resources. This was expected 
since the service the RMTP flow receives in each node of the network decreases 
when the node serves multiple flow in a fair manner. 

To compare the performance of RMTP over a Fair Scheduler network with 
that over a FIFO scheduler network we have repeated the previous tests using the 
pessimistic algorithm over a FIFO scheduler network. The results of these tests 
are shown in Figures El El Q ^ El and El The title of each of the corresponding 
graphs starts with “FIFO” . 

The previous graphs show clearly that when the RMTP flow is the only 
flow in the network, the use of FIFO routers or FQ (Fair Queueing) routers 
give practically the same results. In contrast, when the RMTP flow enters in 
concurrence to other flows in order to share the limited network resources, the 
use of FQ routers gives better performance than that of FIFO routers. 

4 Conclusions and Perspectives 

In this paper we have presented our work concerning the design of a new end-to- 
end congestion control algorithm to be used with the Reliable Multicast Trans- 
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Fig. 3. Throughput results: Isolated 

RMTP 




Fig. 4. Throughput results: RMTP with 
TCP 




Fig. 5. Delay results: Isolated RMTP Fig. 6. Delay results: RMTP with TCP 




Fig. 7. DR retransmission results: Iso- Fig. 8. DR retransmission results: RMTP 
lated RMTP with TCP 
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port Protocol RMTP. This algorithm is based on the FS paradigm for end-to-end 
congestion control proposed by Bierasck and Legout. The use of the FS paradigm 
allowed us to have worst case bounds on the queueing delay that a block of pack- 
ets experience on a network path between two nodes. These worst case bounds 
are used to compute the sending interval of a RMTP source. In other words the 
use of the FS paradigm allows the RMTP sender to regulate its sending inter- 
val in function of the minimum service that the RMTP flow will receive in the 
nodes of the multicast tree. We have proposed two versions of this algorithm, an 
optimistic and a pessimistic version. In the optimistic case the DRs are consid- 
ered to be able to retransmit packets in its local group under all circumstances. 
In contrast in the pessimistic algorithm the DR might not be able to achieve 
error recovery in its local group and therefore it will ask the sender for global 
recovery. The optimistic algorithm shows a superior performance compared to 
the pessimistic one in terms of throughput average. 

The performance of RMTP with the presence of TCP flows on the network 
has then been evaluated in the case of a Fair Scheduler network and in the case 
of a FIFO Scheduler network. The simulation results in the first case show that 
the performance of RMTP decreases progressively when the number of TCP 
flows increases. In contrast, in a FIFO scheduler network, RMTP performance 
decreases significantly when the number of TCP flows increases. So when using 
the FS paradigm each multicast and unicast flow will receive always a minimum 
amount of service in all routers of the network, and their coexistence becomes 
more viable. In this paper we have focused on the performance evaluation of 
RMTP under the FS paradigm. However to complete this work we have to study 
the impact of this paradigm on the performance of TCP flows, which is one of 
the subjects of our future work. 
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Abstract. In this paper, we propose an architecture for Quality of Service 
(QoS) control with a proxy server for multimedia applications in heterogeneous 
communication environments of wired and wireless networks. The concept is 
based on a traffic scheduling mechanism on the application level for supporting 
various mobile user traffic streams. The proxy server is located in the base 
station and is responsible for user profile management and QoS adjustment. Via 
an application user interface, the user can click on a quality button to send an 
user feedback to express the actual required QoS. The concept is furthermore 
characterized by the gathering of queue lengths of packet flows and the 
calculating of the loss probability for QoS monitoring in the proxy server. 



1 Introduction 

Quality of Service (QoS) is an important factor in the service capability of the 
networks. QoS management is a topic of considerable amount of research over 
several years. The issue of QoS guarantees is no doubtable one of the great challenges 
in multimedia communication networks; not only in fixed networks including 
internet, but also in wireless networks. 

Until now, most of the works have been in the developments of individual network 
components such as end-systems, transport and network protocols, protocols and 
functions in the management plane, etc. Some of works proposed general frameworks 
for QoS management, but most of them require changing of whole protocol structure. 
Summaries of QoS architectures can be found in [1][2]. Well-known are the QoS 
concepts related to layer three of the OSI reference model such as IntServ, DiffServ 
and above layer such as MPLS (Multiprotocol Label Switching). QoS concepts on 
lower layers are specially proposed for Local Area Networks (LANs) which are based 
on OSI layer two. Examples are Subnet Bandwidth Manager (SBM) for shared and 
switched 802 LANs such as Ethernet (also FDDI, Token Ring, etc.). Other layer 2 
technology has been QoS enabled such as ATM (Asynchronous Transfer Mode). A 
lot of QoS architectures for end-systems were also proposed. Such architectures were 
shortly described in [1]. It was argued that there are commonalities exist between 
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QoS control and strategies found in end-system and network. This seems to be at the 
first glance, but there are different goals in designing control mechanisms between 
the intermediate system (routers, switches) and the end-system. The extent, to that the 
network-level QoS mechanisms can be applicable in the end-system (or vice versa) is 
still an open issue. 

Naturally, the proposed concepts for fixed networks can be also applied with 
careful consideration for wireless networks. However, it is critical in wireless 
networks, where the wireless links are more error prone than the wired parts. On the 
other hand, there is a discrepancy in performance between desktop computers and 
handheld terminals of various types. The changes in network and technology have 
caused an increase of heterogenities in distributed multimedia systems. Moreover, in 
mobile and wireless systems the quality of service of connections may change rapidly 
over time. Due to the varying transmission characteristics of the wireless links, the 
providing of QoS guarantee in wireless networks and heterogeneous environment is 
rather limited calling new directions of research in QoS treatment over wireless 
network environments. There is a need for mechanisms to compensate the 
performance gap, that means also the gap of quality of service which perpetually 
remains between different types of end-systems. The consideration of the user 
preferences will be also an indispensable necessity in future networks 

In this paper, we present a framework for the design of such mechanisms which 
address the following three questions: where is the suitable location of QoS control? 
How can the gap of QoS be compensated ? How can the QoS be monitored and 
adjusted? To answer these question, we propose a proxy server located in the base 
station. The proxy server receives the data stream sending from a fixed host, 
transforms and forwards the data packets to the mobile terminal according to the user 
profile and current available link bandwidth. The transmission of data packets occurs 
at the application level using a scheduling algorithm and a reference time slot system. 
The QoS is dynamically adjusted regarding the link bandwidth measured by the 
proxy server. At any time, the customer can send a user feedback to the proxy server 
for expressing his/her preferred QoS. We focus in this paper on QoS management of 
video streams, since it is the most critical component in the multimedia applications 

The rest of paper is organized as follows. In section 2, we describe the background 
of this research and review related works. In Section 3, the proposed QoS 
management architecture is presented. Section 4 presents the experiment with the 
concept. Finally, Section 5 concludes this paper. 



2 Background and Related Works 

Many efforts have been done on constructing QoS architectures to support end-to-end 
QoS management. Most of these are predicated on the availability of guaranteed 
services and concentrated on the implementation of QoS mechanisms in networks and 
end-systems as well as their coordination in order to provide QoS support to the 
application [3]. However, multimedia applications have fluctuating needs on network 
resources which may also vary in heterogeneous environment. An overview of future 
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wireless internet access architectures is slightly given in [4]. The paper discussed the 
cornerstones of wireless internet access including proxy structures. Proxy structures 
enable the distribution of internet protocol operation between both the end-system 
and the network. Examples are WAP (Wireless Application Protocol), Snoop (a 
protocol booster) and ReSoA (Remote Socket Architecture). All these approaches 
take into account that the wireless links should be treated in a special manner. 

In contrast to the main QoS issues in wired networks, the QoS degradation in 
wireless networks is not only due to the congestion, but also caused by possible high 
BER of the radio channels, fading and interference between channels etc. Thus, the 
usual retransmission for recovering lost packets is not suitable and not efficient any 
longer. The Eorward Error Correction (EEC) has disadvantage of complex overheads. 
New solutions are necessary. To date, there are efforts made in improvements of 
radio link technologies, new and efficient compression techniques or improvements 
of the protocol stacks, etc. [5] [6]. 

The future networks are likely to remain heterogeneous due to the increasing 
changes in network technology. Not a single technology can cover all aspects of 
communication applications. The coexist of various technologies and equipment 
devices restrain the overall QoS support for all user. A customer with desktop-type 
receiver certainly would accept other QoS perceptually than a customer with a 
handheld receiver. This gap of QoS should be compensated by a suitable QoS 
mechanism which should be flexible to adapt to wireless links and user preferences. 

Recently, many QoS aware applications have been developed which are capable of 
adapting to their environment. Recalled, from the network’ point of view, the QoS 
guarantee is the ability of the networks to fulfill the different request of the 
applications of customers. If the networks cannot satisfy the request of the users due 
to resources lacking, the applications are to adapt to this situation. Adaptive 
applications could adapt their operation to deal also with variations in QoS 
characteristics of networks. As an example is the play-out buffer adaptation [3]. The 
other possibility is the QoS on-demand approach [3] [7] [8] in which the user can 
actively react to achieve as a better QoS as possible. In this case, there is a need for 
controlling the user feedback and for monitoring QoS. Some QoS controlled 
applications were developed which enable the user to take control the QoS. In [8], 
Bechler et.al. distinguish three types of applications: common, adaptive and proactive 
applications. Unlike adaptive applications which are able to react to changes in the 
network condition, proactive applications actively affect the scheduling of resources. 
However, all these approaches either concentrated on the QoS manager in end- 
systems or did not consider the interactions between the applications and their 
behaviour. 

In our approach, we suggest using a proxy server in the base station to compensate 
the gap between the different requirements of mobile terminals. Our approach differs 
from the proxy proposed in [7] in the fact that we suggest an active proxy structure. 
Active means that we use a traffic scheduling at the application level to manage the 
data streams sending to mobile terminals. Why is a traffic scheduling necessary? The 
reason is that we can give different priorities to the data streams sharing the same 
resource (buffer, bandwidth) in the base station. In case of lack of link bandwidth, it 
is necessary to decide how to drop data packets to ensure the QoS for certain 
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connections. In addition, we developed a graphical user interface that allows user to 
send the feedback signal to express the desired quality of connection. 



3 QoS Management Concept on the Application Level 

Figure 1 shows the principle of the proposed architecture in the context of wireless 
multimedia communication. The QoS proxy server takes the role of the QoS manager 
and the resource manager at the application level. In addition, it manages the user 
profiles. These components are described later in this section. 




Terminal 



Base Station 



Fig. 1. QoS management architecture 

As we discussed before, a mobile computer is more resource poor than a stationary 
computer and the properties of wireless networks are different from wired networks. 
Let us consider a situation where the fixed host sends too much multicast data to the 
mobile terminals when the link bandwidth of wireless network becomes low. In this 
case, the application typically should reduce the data rate sending to the mobile 
terminals, thus the QoS of all receivers will be affected. In a such a heterogeneous 
environment, the proxy server filters the data from the Internet and sends them to the 
mobile terminals according to their performance, QoS requirements and bandwidth 
allocation. We believe that this proxy structure is also advantageous for charging and 
billing purposes. 



3.1 The Basic Concept 

The proxy server consists of three functions: the QoS management, the resource 
management and user profile management. The proxy server receives and stores the 
user preference sending from the mobile terminal. Receiving data streams from the 
Internet, the QoS manager filters packets, converts data and sends the transcoded data 
to the receiver. The filter function was already mentioned in [7] and some other 
works (QoS-A pointed in [1]). If the bandwidth of the wireless link becomes low, the 
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QoS manager adapts the data streams to the actual bandwidth condition (e.g. discards 
unimportant packets, converts video to monochrome, etc.)- The resource manager is 
responsible for monitoring the actual link bandwidth and allocating bandwidth to 
different data streams using a scheduling algorithm. The connections are given 
different weights according to the QoS requirements. For scheduling a queue system 
is used. A reference time slot system helps to define the priorities of the connections. 
At the application-level, it is advantageous for the transcoding function. In addition, 
the concept may not depend on the underlying network system. 

A simple concept model is shown in figure 2 for a video transmission application. 
The receiver is a mobile terminal and the sender may be an another mobile terminal 
or a host in the wired network. For a video call, the call processing is as follows. The 
sender sends a connection setup message to the call manager of the wireless network. 
We suppose the call signaling was done by the wireless network and a connection is 
setting up. Now, the receiver has to send a short message containing his preference 
and his user profile is stored in the proxy server. This information are codec, frame 
rate, display size, color depth. These are the application-level QoS parameters. The 
proxy server set up a queue for the connection with a corresponding weight, i.e. the 
connection is allocated a bandwidth regarding to the weight. The sender application 
then admits a video stream to the respective queue in the base station. This data 
stream is then transcoded according to the user profile, i.e. the required QoS of the 
receiver and is sent to the receiver terminal. Pushing data packets into queues, the 
proxy server marks the data packets with respect to important and unimportant video 
frames. For determining the service order of the data packets, a bandwidth scheduler 
is proposed. We suggest to use an adaptive bandwidth scheduler proposed in [9], 
which is able to adjust the weights of connections. A reference time slot system is 
proposed regarding the actual order of packets that will be transmitted on the physical 
link. 



Video Camera 




Fig. 2. The concept model for video transmission application 



During the connection, the proxy server monitors the actual link bandwidth 
condition and adjusts QoS. The bandwidth is measured by the received data packets 
per time. The receiver has then to count the packets and sends it to the resource 
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manager. If the link bandwidth is very scarce, or when the receiver is not reachable, 
the proxy server discards unimportant frames or even all video frames and sends only 
audio during the critical time. 

By gathering the queue lengths, a predication of QoS can be made. An increase of 
the queue length means the connection is worse treated or much data packets sending 
from the sender. In any case, there exists the risk of packet loss accompanying a QoS 
reduction. Two decisions can be made. Either the customer sends an user feed back to 
request a better QoS, or the proxy server adjusts the QoS itself with current 
bandwidth condition. An user interface is developed which allows the user to express 
his preference, i.e. the quality he/she wants to pay for, by clicking on a button on the 
display window. Upon his clicking, a user feedback signal is sent to the proxy server. 
The proxy server then varies the weight of his connection queue, thus he can receive 
more bandwidth in compare to other connections. 

In following, we describe the components developed in the concept including the 
graphical user interface, the user profile, the user feedback and the QoS scheduler. 
The concept demonstration for video transmission was written in C and TCL/TK. 



3.2. User Interface 

The simple user interface is depicted in figure 3, consisting of a video display area 
and three main buttons. By clicking on the call button, the label is then changed to 
stop. The destination address is need to enter, either the IP address or the name of the 
target computer. The other button is designed for setting user preference. The user 
preference is selected at the call setup, it is also possible to send the preference to the 
proxy server during the call. The Quality button (minus or plus, i.e. worse or better 
QoS) is for user feedback. 



HN-2.001 |-|j| 
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Fig. 3. Graphical User Interface 

The other button are for setting the video device such as port of the video card or 
setting the X-Window for test without a video camera. On the other side, the 
application receives the transcoded video stream from the base station, decodes it and 
plays out the video stream on the display area. 





Fig. 4. Window for set user profile 
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3.3. User Profile 

The user profile contains information about the technical characteristics of user's 
mobile terminal. This information includes the display size, the color depth, etc. 
Furthermore, it includes the receiving preference of the customer, for example the 
desired codec, image size, brightness, contrast and the maximum bandwidth. 

Clicking on the user profile button, the customer then set the desired parameter 
expressing the preferred application-level QoS. These parameters are then sent to the 
proxy server and stored in a user profile. Figure 4 shows the current defined 
parameters. Up to now, there have been various video data compression techniques 
defined by the standard organization. Thereunder are MPEG-1, MPEG-2 of Motion 
Picture Experts Group, H.261, FI.263 of ITU (International Telecommunication 
Union). Other technique for still image compression is JPEG (Joint Photographic 
Experts Group). However, we use in the concept demonstration only three examples 
to consider the possibility of the proposed concept. 

Parameters affecting the video playout are brightness, contrast, frame size and 
color depth. The user can specify these parameters using a simply click on the setting 
window or the slider bar. The frame size or resolution corresponds to the number of 
pixels in one frame. We assume to set one of three values: small (160x120), medium 
(320x240) and large (640x320). The brightness and contrast are preset to 60 and can 
be varied from 0 to 100. These parameters can be observed on the display of 
receiving window together with other information such as actual frame rate, actual 
received bandwidth, loss rate, etc. The maximum frame rate is 30, what is a typical 
limit for a video transmission. 



3.4. User Feedback 

The user feedback is performed via a simple interface with two buttons: plus and 
minus button on quality. By clicking on one of these buttons, the user sends a special 
message indicating USER_FEEDBACK to the proxy server. This express the desired 
quality of service of the customer. One way to do this is to send a single parameter 
indicating the desired quality, for example an utility value. The proxy server has then 
to map this value to the network parameter such as bandwidth, delay, loss etc. An 
algorithm is necessary to translate the application-level QoS parameters into the 
network QoS parameters. The mapping can be done by a translation function or a 
table. 

However, we did not just developed the mapping function in the system. Instead of 
that, we use currently a simple way, in which we suggest that, the user will send the 
feedback message with an explicit wish on quality, e.g. bandwidth, image size, etc. 
We are investing on the development of a quality function to generally express the 
quality desired by the user. The quality function should express the utility of the user 
and is a function of the bandwidth, the efficiency of the transmission protocol and the 
error rate. This is the topic of next works. 
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3.5. QoS Scheduling and Monitoring 

As mentioned above, the proxy server has three functions. The functions of QoS 
management and resource management are related closely together and form together 
with the queue system a QoS scheduling system. According to QoS requirements of 
the user, the proxy server set up the weight for the customer's connection. 

The proxy server receives periodically the information about the link bandwidth 
measured by the receiver and adjusts the quality of connection to the actual link 
condition. In case of lack of bandwidth, the proxy server will decrease the frame rate, 
discard unimportant frames, decrease the color depth, or even discard all video 
frames, thus only voice is transmitted in this worse case. For this purpose, the video 
packets are marked as important and unimportant packets with an indicator in the 
packet header. The information about actual bandwidth is calculated based on the 
actual received packets in the receiver and is sent to the proxy server using a special 
message indicating BANDWIDTFl_INFO. 




Fig. 5. The concept model for video transmission application 



Figure 5 shows the principle model for QoS scheduling in the proxy server. As shown 
in the figure, the proxy server manages a queue system. Each queue is for one 
connection. Incoming packets from different flows arrive in the queue system. The 
queue lengths are gathered and the packet loss probability is calculated. Base on this 
probability and the feedback from the receiver, the proxy server monitors the QoS of 
packet flows and adjusts QoS to the actual link condition and the QoS requirements 
from the user. A separate queue is for each down link connection. We consider a 
reference time slot system according to 16 time slots of a W-CDMA frame (as 
described in the UMTS standards). In the packet mode, the base station has to select a 
packet from a connection during each down link time slot. By giving different 
weights on queues, we can determine the order of packet from different connections. 
Upon user feedback, the weight of the desired connection is changed. The decision 
process has then to select the packets from queues according to their weights, thus a 
virtual time slot order is realized. The incoming packets are marked according to 
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important or unimportant frames, as we mentioned above. For each queue, the 
scheduler maintains two parameters, namely the start tag ST and the finishing tag FT. 
The scheduling algorithm is defined as follows [9] : 

Sr. = max{TA^ , FT. ) 




4 

9i(t) 



where i is the index of the queue i, TA is the arrival time of the packet k in the queue 
i, g( t) is the allocated bandwidth for the queue i (weight of the connection), L^. is the 
packet size in bytes (including packet header, in fact it is the PDU size). 

Whenever a packet arrives into a queue, the start tag and finishing tag are 
calculated in the scheduler. The decision is made based on the finishing tags, i.e. the 
packet with smallest finishing tag will be pushed at first to the layer below. By this 
way, the order of packets is determined. If the packet is first pushed into the lower 
layer, intuitively it will be also first served by the lower layer and will have 
preference before other packets by transmission into the wireless link. Without this 
virtual reference time slot system, the packets will be intuitive pushed into the below 
layer and served in the FIFO manner (First In First Out). Upon user feedback signal, 
g( t) is adjusted and the finishing tag of corresponding queue is updated. 

The relation between the bandwidth allocation and the queue length can be 
described by the following equation [9]: Cft) = B * (1 + X.(t) ), where B is a constant, 
X- is the queue length of a queue i, C. is the allocated bandwidth to the queue i. The 
probability of packet loss P is equal P (X = XJ, where X^ is maximal queue length of 
a queue. That is the probability, by which the maximum queue length is exceeded. 
An increase of this probability means that the more packets in the queue. That is, 
either the sender is sending more packets or the link bandwidth becomes lower. It is 
the task of the proxy server to discard unimportant video packets or to adjust the 
weight of queue, i.e. to increase the bandwidth for the connection to keep up the 
desired quality. 



4 Experiment Results 







Fig. 6. Frame rate evolution regarding to weight 
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We have experimented using a laboratory testbed consisting of three computers: a 
video sender, a video receiver and the proxy server. Each computer is a Sun machine 
connected to each other over the Local Area Network. One key issue is the weight 
adjustment. The result of first experiment is shown in Fig. 6. The weight was 
increased by step of 2 at times t=20s, t=40s, t=60s and decreased at t=100s, t=120s 
according to the clicks on plus/minus QoS-buttons, respectively. The adjustment step 
is an implementation issue and depends on the available bandwidth. The frame rate, 
i.e. the QoS is better by increasing weight. At times of change, the frame rate is 
shortly low due to the updating in the system. 



5. Conclusion 

In the paper, we have discussed the issues of providing QoS in wireless 
communication networks. We proposed an architecture for Quality of Service (QoS) 
control with a proxy server for multimedia applications in heterogeneous 
environments of wired and wireless networks. The concept is characterized by a QoS 
scheduling mechanism on the application level. The proxy server is located in the 
base station and is responsible for user profile management and QoS adjustment. We 
developed a graphical user interface that allows user to send the feedback signal to 
express the desired quality of connection. The concept is furthermore characterized 
by the gathering of queue lengths of packet flows and the calculating of the loss 
probability for QoS monitoring. The QoS is dynamically adjusted corresponding the 
link bandwidth measured by the proxy server. 

We intend to investigate in future works on the mapping of application-level QoS 
into network QoS using an utility function. Furthermore, we intend to extend the 
concept by integration of the utility function, the QoS-scheduling, the loss probability 
and the QoS adjustment. 
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Abstract. In this paper we argue that overlay multicast is an impor- 
tant technology for applications requiring a group communication ser- 
vice. With this approach end- hosts (running the application), dedicated 
servers and/or border routers automatically self-organize into a distribu- 
tion topology where data is disseminated. This topology can be composed 
of both unicast connections and native multicast islands (e.g. within each 
site). Therefore it offers a group communication service to all hosts, even 
those located in a site that does not have access to native multicast 
routing. 

One of the issues raised is the set up of an efficient and robust overlay 
topology. In this paper we discuss several possible solutions. We show in 
particular the benefits of having a centralized approach, of using redun- 
dant links and updating the topology based on a host stability criteria. 

Keywords: group communications, multicast routing, overlay topology, 
application-level multicast 



1 Introduction 

Group communication traditionally requires that each node at each site has 
access to a native multicast routing service. If intra-domain multicast (within a 
LAN or a site) is widely available, this is different for inter-domain multicast. 
Today many ISPs are still reluctant to provide a wide-area multicast routing 
service [5]: there are technical reasons (many aspects are still research topics), 
marketing reasons (e.g. which pricing model) and an “egg and chicken” problem. 

At the early years of the MBone, the traditional solution was to set up a 
tunnel to a site connected to the MBone. Because of its limitations, it is now 
banned from new native PIM-SM/MSDP/MGBP deployments [1]. Another so- 
lution is to use a reflector [4]. A reflector is a host connected to the multicast 

* This work has been done in the DSE (Distributed Systems Engineering) 1ST 1999- 
10302 European project. The consortium includes industrial companies involved in 
the space business: Alenia Spazio S.p.a., EADS Launch Vehicles, and lABG, as well 
as technology providers and research center: Silogic, Societa Italiana Avionica (SIA), 
University of Paris 6 (LIP6), CNRS-LAAS, and D3 Group. More information can 
be found in the DSE web site: http://cec.to.alespazio.it/DSE/ 



P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. filO- loT^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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backbone and which creates point-to-point connections to all the remote hosts 
that do not enjoy inter-domain multicast routing. If this solution creates hot 
points within the network, on the other hand it is set up for a limited span of 
time - the session duration - and for a limited number of groups - those of the 
session - unlike tunnels. 

Neither of these solutions is satisfying even if reflectors are frequently used. 
One of the goals of overlay multicast (also known as Host-Based Multicast, End 
System Multicast, or Application-Level Multicast) is to enable every host to 
participate in multicast sessions efficiently, no matter whether they have access 
to native multicast or not. 



2 General Overlay Multicast Specificities 

2.1 Specificities Compared to Traditional Multicast 

The Overlay Multicast (OM) approach differs in many respects from traditional 
multicast routing: 

— First of all a forwarding node in the overlay topology can be either an end- 
host (i.e. running the application), a dedicated server within the site, or a 
border router. On the opposite traditional multicast trees only include core 
routers. 

— With an overlay, the underlying physical topology is completely hidden. A 
directed virtual graph is created between all the nodes. The virtual point- 
to-point links are assigned a weight corresponding to the one-way distance 
between nodes (several metrics are possible) . Undirected graphs can also be 
used if the possibility of having asymmetric routes is overlooked. 

— Another consequence is that an overlay topology, built on top of the existing 
infrastructure, can integrate different flavors of multicast and unicast pro- 
tocols (e.g. several areas running different intra-domain protocols like PIM, 
MOSPF, DVMRP can be connected together in an OM topology). 

— In traditional multicast, the membership knowledge is distributed in the 
core multicast routers. With an OM group members are known either by a 
Rendez-vous Point (RPjil [6], by the source, by everybody or is distributed 
among members [3]. 

— The overlay topology is potentially completely under control. For instance, 
our proposal takes advantage of the additional knowledge centralized at the 
RP (node identity, distances, node/link specificities) during the topology 
creation process. 

— Yet a major drawback of involving end-hosts as transit nodes is that it 
reduces the reliability of the group communication service. Indeed an end- 
host is less reliable than a router or a physical link. For instance, if the 
overlay service is implemented as a library, then the node disappears if the 
application crashes or is stopped. Simulation results are given in section El 



1 



This OM RP is different from PIM-SM RP. 
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— Another drawback is that the scalability of OM is lower than that of native 
multicast. This is another motivation for using multicast areas in the overlay 
whenever possible as all the nodes of this area are hidden behind a single 
OM correspondent. 



3 Our Proposal: Host-Based Multicast (HBM) 

3.1 Sketch of Our HBM Proposal 

Our HBM proposal [10] distinguishes core members (CM) that are part of the 
core distribution topology and non-core members (nonCM) that graft on the 
existing topology as leaves. This distinction is based on several criteria explained 
in section 13.21 Everything is under the control of a central rendez-vous point 
(RP). This RP knows CMs and nonCMs and the distance between them (several 
metrics are possible here). This RP is responsible of calculating the OM topology 
and informing CMs/nonCMs. CMs periodically evaluate the distance between 
them and inform the RP. Likewise nonCMs evaluate their distance with CMs 
and inform the RP. 

Of course this scheme: 

— has a limited scalability. . . 

More generally any OM solution based on point-to-point communications has 
scalability problems (even if having a central RP in HBM adds some more 
limitations). Yet many collaborative work sessions only include a limited 
number of hosts/sites and scalability is not a problem then. Besides a single 
HBM node can easily serve many local participants using the locally available 
multicast. 

— and greatly relies on the RP reliability . . . 

If the RP is collocated with the primary source (if any), then this is not an 
issue as any failure of the source host would anyway compromise the service. 

On the other hand: 

— this is a simple solution. . . 

As all the information is centralized in the RP, there is no coherency prob- 
lem and it does not create too much load on the nodes (an asset in case of 
lightweight hosts like PDAs) . This is completely different in distributed solu- 
tions like [3] where each node runs various algorithms for group maintenance 
and incremental mesh quality improvement. 

— which can easily create a “not too bad” topology. . . 

The topology is optimal with respect to the distance database at the time 
of its creatiorfl. The update frequency of the distance database depends 
on various criteria like the group size (the larger the group, the lower the 
frequency) and node specificities (a powerful workstation can update the 
database more frequently than a PDA). 

^ More precisely, it is only limited by the ability of the topology solver to find an 
optimal solution. 
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3.2 Offering a Robust Group Communication Service 

We argue that robustness is a key issue to OM solutions which are intrinsically 
fragile (sections 12.11 and 01 . To improve it we introduce three mechanisms that 
all take advantage of the centralized knowledge at the RP. 



The Need for Redundancy. First of all we add some redundancy in the topol- 
ogy. An algorithm, presented in Annex El adds a certain number of Redundant 
Virtual Links (RVL) until the probability of having a partitioned topology after 
a node failure falls below a predefined threshold. This solution is not source de- 
pendent and therefore the OM robustness is the same no matter how many and 
where the sources are. 

Of course some loops are created. Yet RVL are clearly identified as such and 
using a simple suppression mechanism is easy: 

if (node N receives traffic both on the OM link and RVL) 
send a SUSPEND message on the RVL 

//on receiving a SUSPEND, the peer stops forwarding packets on the RVL 
// during a few seconds 

if (node N receives traffic on the RVL but not on OM link) 

// there is a problem, yet N still receives new traffic and can 
// forward them on the OM 

wait some time and send a failure report to RP if situation persists 



The Need for Fast Failure Discovery and Recovery. Robustness also 
requires that HBM node failures are rapidly discovered. This feature depends on 
the distribution topology in use ( section 1,3. .311 : 

— with a ring, one or two nodes in the ring must receive each packet twice, 
once in each direction. Otherwise there is a failure. 

— with a shortest path tree, ACK messages can be generated by the leaves and 
aggregated by the transit nodes as they are sent back to the source. A transit 
node that does not receive an ACK from one of its downstream neighbors 
can easily conclude that there is a failure. 

— using RVL provides a way to detect some failures rapidly. Yet multiple si- 
multaneous failures may not be detected. 

Each time a failure is detected, the topology is updated. Depending on the 
failure, this update can either completely reorganize the topology or just a subset 
of it (e.g. a partitioned area can be graft on the closest active transit node even 
if the new topology is sub-optimal). 

Note that failures are usually due to application stop or crash, more rarely 
to link failures or WAN routing problems. Therefore a partition in the overlay 
topology does not prevent individual nodes and the RP to communicate using 
point-to-point connections. 
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The Need for Adaptation. Some of the nodes can turn out to be unstable 
(e.g. a mobile node with a bad wireless connection). Even if HBM includes 
redundancy and failure discovery mechanisms, instability must be taken into 
account when creating the topology. The idea is to have stable transit nodes while 
unstable ones are moved to the leaves of the topology. Of course the stability of 
a node is unknown when he first joins a session. A default (conservative) value is 
first assigned to the node stability variable and this latter is regularly updated 
as time goes by. 

In order to make adaptation possible, we associate a “capability” to each 
node. This capability has three possible values: disconnected, leaf_only (i.e. 
is a nonCM), transit_possible (i.e. is a CM). We first calculate a normalized 
capability, NCap: 

NCap{node) = /{user -desires, nodestability, RP-par am) 

where RP-param is a parameter specified by the RP to influence the capability 
of a node (e.g. if all the users choose to be leaf _only, then the RP can oblige 
some them to be transit node). Then NCap{node) is compared to predefined 
thresholds in order to determine the exact capability of the node: 

if {NCap{node) G [0;a[), then the node is disconnected (exceptional if a is 
small) ; 

if {NCap{node) G [a,/3]), then the node has capability “leaLonly” (nonCM); 
if {NCap{node) G ]/3, 1]), then the node has capability “transit_possible” (CM); 

This is a lightweight mechanism as the RP already keeps per-node state 
information. It only adds four variables: the user-desires, the node stability 
(dynamically updated), the RP-param and the node capability. 

3.3 Possible Topologies 

So far OM work essentially focussed on trees (e.g. [3]). In our work we consider 
several potential topologies, each of them having distinctive features (figure Q: 

bus: serial connection of all the nodes. 

tree: several kinds of trees are possible, like Shortest Path Trees (SPT) and 
Minimum Spanning Trees (MST). A SPT is per-source and in case of dif- 
ferent sources, several SPT must be created which turns out to be costly. 
On the opposite a MST is source independent which is an asset with (n, m) 
group communications. 

ring: solution of the “traveling salesman problem” . The topology is source in- 
dependent. 

star: all the nodes are connected around a central node. 

sun: a “sun” is a “star” with a non null diameter. It is therefore composed of 
an internal “ring” with peripheral “solar beams” . 
hybrid: hybrid topologies are possible that mix for instance the “tree” and 
“sun” solutions. 
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ring topology sun topology 



Fig. 1. Some possible overlay topologies. 



Choosing one of these topologies has serious impacts on robustness and per- 
formances. To analyze them we wrote a simulator. It takes in input a randomly 
and homogeneously distributee^ set of nn nodes. A topology solver is run, cre- 
ating MST and Ring topologies. Each topology is then analyzed, failures in- 
troduced, and statistics gathered. Experiments are repeated 30 times for each 
nn. 




number of nodes in topology 



Fig. 2. Impacts of a single node failure on the connectivity of a MST. 



Robustness in Front of a Node Failure. We simulated the impacts of 
a single node failure on the connectivity when the OM topology consists in a 
Minimum Spanning Tree (MST). Results are shown in Figure For each value 
of nn, we successively turn down each node. We then measure the number of 
hosts still connected, cn, and plot the average {min /aver /max} values of cn. 

This experiment shows that with a Minimum Spanning Tree, a single node 
failure can easily partition the whole OM topology. If on average 62 to 84% of 
nodes remain connected, this value can be as low as 30%. On the contrary with 
a ring a single failure does not partition the topology. 

® In case of non-homogeneous node distributions, e.g. to simulate the impacts of a 
trans-atlantic line, the Traveling Salesman solver that creates the ring topology must 
be modified to take it into account. [9] page 445 gives such an algorithm. 
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Fig. 3. Average delay for the Minimum Spanning Tree (MST) and Ring OM topologies. 



Performances in Terms of Delay. [3] introduces several metrics to appreci- 
ate the quality of an overlay topology and which can be used during the topology 
creation process. In this section, we only focus on average group-shared delays 
[ 11 ], 

This latter, for a given set of nn nodes and an OM topology, is given by: 

aver. delay {nn, topo) = meariau.possibie.sources 

{meanaii_nodes^source{delay {source node))) 

Not surprisingly above 15 nodes, a MST has a lower average delay that a ring. 
Yet the MST delay range is much higher and till 65 nodes, there are situations 
where the MST average delay is higher than the ring average delay. 

4 Related Works 

Yoid [6] is another OM scheme. If Yoid and HBM both rely on a RP, many 
differences exist. In particular Yoid creates a tree and uses a complex algorithm 
to avoid the creation of loops. Yoid also assumes that all nodes are stable. 

AMRoute [2] [8], developed for Adhoc networks, establishes an overlay topol- 
ogy for multicast communications. AMRoute distinguishes two kinds of topology: 
the mesh, a highly interconnected topology and the tree, a subset of the mesh, 
used for an efficient data delivery. If AMRoute does not try to evaluate inter- 
host distances, the use of an Expending Ring Search (ERS) algorithm takes into 
account locality. If convenient in an Adhoc network where wireless communi- 
cations enable diffusion, ERS is not feasible in the Internet unless multicast is 
already available! 

In NARADA [3] a mesh is first created and then a Reverse-Path Forwarding 
algorithm (e.g. DVMRP) is run on top of it to create a SPT per source. Many 
differences exist with HBM: there is no global view of the topology, it requires 
the use of an incremental mesh improvement technique, and per-node adaptation 
is not possible (a problem in case of highly heterogeneous nodes). 
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In [7] an application layer routing architecture (ALR) is automatically cre- 
ated using an active network framework (ALAN). The topology creation process 
follows a hierarchical approach (for improved scalability) and relies on several 
metrics. 

5 Conclusions 

This work introduces an overlay multicast solution, HBM, which offers a group 
communication service to all hosts, even those located in a site where inter- 
domain multicast is not available. It discusses the issues raised by the creation 
and the management of this overlay topology. We argue that a centralized solu- 
tion, where the group membership is known by a Rendez-vous Point (RP), has 
many benefits over distributed solutions. Having a centralized topology manage- 
ment is simple (no coherency problem), efficient (the RP can create an optimal 
topology with respect to the distance database accuracy) and takes advantage 
of known node features during the topology creation process (e.g. to have sta- 
ble transit nodes) . We also argue that having additional redundant connections, 
even if it introduces additional traffic and loops, is important in front of the 
intrinsically fragile nature of an overlay topology. 
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A An Algorithm for the Addition of Redundant Virtual 
Links (RVL) 

This section describes a scheme to add RVL into the logical topology to decrease 
the probability of partitioned topology after a single node failure. The RP defines 
how many redundant links to create by using the following algorithm: 

// initialization 

acceptable_threshold = 0 // or another value in [0; 1[ 

AddRedundantLinks (virtual_topo , all_members_in_virtual_topo) ; 

// recursive solver 
AddRedundantLinks (topology, group) 

{ 

if (proba(partitioned_topo , 1 failure) <= acceptable_threshold) 
exit; // solution found, no need to go further 

if (number of members in group <= 2) 
return ; 

find the set N of farthest CM nodes in logical group; 
foreach (2 nodes, NO and Nl, in N) { 

1. split the logical topology into two subgroups (subgl, subg2) 
such that each subgroup includes either NO or Nl and all the 
nodes that are closer to it than to the other in the PHYSICAL 
topology (not necessarily on the virtual topology!) 

2. calculate the probability of partitioned topology in the case 
of a node failure before and after adding the RVL between NO 
and Nl. 

if (new_proba(partitioned_topo , 1 failure) >= 
previous_proba(partitioned_topo , 1 failure) 
return; // do not add this link 

else { 

Add this redundant link NO < — > Nl to topology; 
AddRedundantLinks (topology, subgl); // continue with subgl 
AddRedundantLinks (topology, subg2) ; // continue with subg2 

} 

} 

} 



Figure El (a) describes physical topology and figure 0(b) the initial OM tree 
that has been created. In this tree the two farthest nodes are G and D. The 
subgroup (subl) for D is A,B,C,E and F. The subgroup (sub2) for G is H and 
as described in figure 0 (c) . The probability of partitioned topology without 
and with the addition of the RVL G^D are 1 and 3/8 respectively. Therefore, 
the redundant virtual link G^D is accepted. As this probability is still greater 
than zero, the RP repeats this algorithm on each of the two subgroups. 

For subl, there are two pairs of farthest nodes; D with F and D with E as 
shown in figure 0(d). As adding the RVL D^F does not reduce this probability, 

^ I is closer to G than to D on the physical topology! 



A Host-Based Multicast (HBM) Solution for Group Communications 619 




(a) Initial physical (b) A possible vir- (c) Addition of the 
topology tual topology G-D redundant link 




(d) Addition of E-D (e) Addition of the 
redundant link H-I redundant link 



Fig. 4. Addition of Redundant Virtual Links (RVL), an example. 



it is not accepted. On the opposite adding RVL D^E reduce this probability 
from 3/8 to 2/8 and is accepted. The same algorithm is then executed on each 
subgroup of subl, namely subll and subl2. As the probability cannot be further 
reduced, the analysis of subl is finished. 

For sub2, the algorithm leads to the addition of link H^I as shown in fig- 
ure 21(e). As the new probability reaches zero, the analysis of sub2 finishes. 

At the end, with three RVL, G^D, E^D, and H^I, the probability of having 
a partitioned topology after a single node failure is null. 

In the previous example, the algorithm runs until the probability of failure 
turns zero. In practice, we can define an “acceptable partitioning probability” : 
acceptable-threshold > 0, and stop the solver when this value has been reached. 
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Abstract. Core-based multicast trees use less router state, but have sig- 
nificant drawbacks when compared to shortest-path trees, namely higher 
delay and poor fault tolerance. We evaluate the feasibility of using multi- 
ple independent cores within a shared multicast tree. We consider several 
basic designs and discuss how using multiple cores improves fault toler- 
ance without sacrificing router state. We examine the performance of 
multiple-core trees with respect to single-core trees and find that adding 
cores significantly lowers delay without increasing cost. Moreover, it takes 
only a small number of cores, placed with a fc-center approximation, for 
a multiple-core tree to have lower delay than a single-core tree with op- 
timal core placement. We also find that traffic concentration is avoided 
as long as the load is spread among a set of cores. These results indicate 
that shared trees with multiple active cores are a viable alternative to 
shortest-path trees. 



1 Introduction 

Multicast routing protocols are built using two basic types of trees: single-source 
shortest-path trees and shared core-based trees. In each case, a set of senders 
wants to deliver data to a set of members, known as the multicast group. With 
shortest-path trees, a separate tree is built for each source, using the least-cost 
paths between the source and the members. With a shared tree, one tree is built 
for the entire group and is shared among all the senders. Core-based trees are 
a simple way to build shared trees; a single router is chosen as the core, and a 
shortest-path tree is built from the core to the members. Senders transmit data 
toward the core until it reaches the tree. 

Shared trees have a significant advantage over single-source trees in that only 
only one routing table entry is needed for an entire group, instead of one per 
source. Hence, BGMP P uses shared trees for interdomain multicast to conserve 
state within the Internet backbone. 

Despite this advantage, core-based trees have a number of drawbacks relative 
to shortest-path trees. Foremost among these is that core-based trees on average 
impose a higher delay between a source and the group members |2| ■ This is be- 
cause packets often must travel first to the core and then to the group members, 

* This work was supported in part by the National Science Foundation under Grant 
ANI-9977524 and NCR-9714680. 
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and the core may not be along the shortest path to each member. In addition, 
the core is a single point of failure; although PIM Pj uses a list of backup cores 

members may experience significant additional delay when a core fails. Fi- 
nally, using core-based trees may cause traffic concentration, in which some links 
in the network are much more heavily utilized than others m 

Surprisingly, very little research has been conducted to study the possibility 
of using multiple cores to ameliorate these problems. The designers of both PIM 
and CBT || considered using multiple cores, but chose to use a single core 
early in the design stage. OCBT |7] uses a hierarchy of cores, in which cores 
at lower levels join to their parent in the higher level, forming a tree. OCBT’s 
use of multiple cores helps to avoid looping problems that were present in initial 
designs of CBT. However, because each of the cores cooperate to form a single 
tree, this structure does not behave any differently than a single-core tree with 
respect to delay, fault tolerance, or traffic concentration. 

In this paper we demonstrate the promise of building shared multicast trees 
with multiple, independent cores. Each core is the center of a separate multicast 
tree, and there is no coordination or dependencies among cores. This design 
improves the fault tolerance of the shared trees and can significantly improve 
performance. 

Our results show that using multiple cores decreases the delay experienced 
by group members, since there is a greater possibility that each member will 
have a core near its shortest path. Furthermore, we achieve the surprising result 
that a small set of cores placed with a fc-center approximation can produce a tree 
with lower delay than a single-core tree with optimal core placement. Finally, 
we show that multiple cores can spread the load of multicast traffic, eliminating 
the problem of traffic concentration. Based on these results, we conclude that 
shared trees using multiple cores are a significant improvement over single-core 
trees and thus a viable alternative to shortest-path trees. 

We begin by examining two alternatives for multiple-core trees and describe 
how these protocols can be implemented. Then we present the results of an 
extensive simulation study examining the performance of these designs with 
respect to delay, cost, and traffic concentration. 

2 Multiple- Core Designs 

We consider two possible designs for multiple core trees and their implications 
for designing a multicast routing protocol. Both of these basic designs result in 
at most one copy of each packet being delivered to the group members. 

2.1 Alternative Designs 

Our multiple-core designs share some basic functionality, namely each core is the 
root of its own bidirectional, shared multicast tree, spanning some or all of the 
group members. Senders that are not group members transmit packets toward 
the core until they reach a router that is part of the bidirectional tree. Using 
this as a foundation, we explore these two designs: 
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1. Senders- To- All. Each sender transmits data to all the cores; members join 
to only one of the cores. Fig. ^ illustrates how the Senders-To-All protocol 
works. In this example, there are three cores, each of which is the center of 
a separate, bidirectional tree connecting a subset of the group members. A 
sender, marked by S, is also a member and has joined to core 1. When it 
transmits a packet, it sends 3 copies, one toward each core, until they reach 
some router on the tree for that core. To receive packets, a member chooses 
a core and joins that core’s shared tree. 

2. Members- To- All. Each sender transmits data to just one of the cores; 
members join all of the cores. Fig. 0 illustrates how the Members-To-All 
protocol works. As with the previous example, there are three cores, but 
this time all members have joined to all the cores. In effect, this creates 
n redundant trees, and the sender chooses only one of them to transmit 
its data. In this example, sender S is also a member so it is joined to all 
of the trees. When it transmits a packet, it can send it on any one of the 
three trees, and can in fact choose a different tree for each packet. Likewise, 
different senders can utilize different cores. 

We are also investigating a third design, in which the senders and members 
both use only one core, and the cores distribute multicast packets among them. 
Distribution among cores could be done using a spanning tree, a ring, or some 
other structure. We do not consider this design in this paper. 



2.2 Senders-to-All Advantages 

The Senders- To-All design has several significant advantages when compared 
to Members- To-All. First, Senders- To-All will on average use less router state, 
which we define as the number of routing entries per group at a given node. 
For both designs there is one tree per core, but for Members-To-All, each tree 
connects all of the members. Thus, for Members-To-All each router is likely to 
have one entry per core for each group, particularly as the group size grows. For 
Senders-To-All, on the other hand, each router is likely to have only one entry 
per group, especially if nearest attachment is used. 
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The Senders- To- All design also has the advantage of giving group members 
control over choosing a core. With Members-To-All, the source is responsible for 
choosing a core, then monitoring its status so that it can switch to a new core 
in case of failure. With Senders- To- All members have greater flexibility, since a 
given core may be good for some members and bad for others, depending on its 
location and the status of the network. With Senders-To-All, members can react 
quickly to failures and can even switch cores in order to improve performance 
characteristics such as loss and delay. 

It is also worth noting that, compared to a protocol such as PIM, both of 
our designs reduce the delay incurred when a core fails. Since the cores are al- 
ready active (either receiving data or having members joined), the time required 
to switch cores after a failure is reduced. Moreover, both designs localize the 
recovery delay to only those senders or members who are using the failed core. 

2.3 Using Multiple Cores 

In order to use multiple cores, group members (or their first-hop routers) need 
a mechanism to discover the identities of cores and select the one they will use. 

We refer to the first of these issues as core advertisement. For single-core pro- 
tocols such as PIM, the current method for advertising cores 0 is to distribute a 
set of candidate cores throughout the network. When a source or group member 
needs to send to or join a group, it selects one of these candidate cores to act 
as the core for the group. All of the sources and group members deterministi- 
cally select the same core because PIM uses a hash function based on the group 
identifier. 

Our multiple-core designs can use this same mechanism to distribute candi- 
date cores and to choose a different set of active cores for each group. The PIM 
hash function produces a different ordering of cores for each group and PIM uses 
this ordering to select a backup core should the primary core fail. Similarly, a 
multiple-core protocol can use the hash function to select a set of n cores, along 
with backups for each of them. 

Once the set of cores is known, the group members must decide which core 
to utilize. For the Senders-To-All design, members must decide which core to 
receive packets from. Likewise, for the Members-To-All design, senders must 
decide which core to send packets to. In most cases, choosing the nearest core 
should give good performance. If a core becomes congested, however, then a 
member or sender may want to switch to a different core. 

Finally, the placement of the cores can impact the cost and delay of the 
multicast tree. We examine the effects of several core placement algorithms in 
the next section. 

2.4 Implementation Details 

The primary issue that must be addressed to implement our multiple core designs 
is packet forwarding. The current multicast routing architecture allows for only 
one shared tree per group. However, in our designs, there are several or many 
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Fig. 3. Using a Bidirectional Tree at a Core 

shared trees per group (one per core). These trees may overlap and thus must 
be identified and built individually. 

To facilitate our discussion we must first explain the current types of routing 
entries used for multicast. Currently, a multicast routing entry may be designated 
using either {S,G) or (*,G) where S is the source address of a multicast packet 
and G is the group address. If a multicast packet matches a {S, G) entry, routers 
assume a shortest-path tree is being used; the packet must arrive on the incoming 
interface listed in the entry and is sent on all outgoing interfaces. Likewise, 
if a multicast packet matches a (*,G) entry, routers assume a shared tree is 
being used and the packet is forwarded accordingly (either on a unidirectional 
or bidirectional shared tree). 

Multiple core trees, as described above, require a new type of multicast rout- 
ing entry. We call this a (G, G) entry, where G is the IP address of the core. 
A router that receives a multicast packet matches it against (G, G) entries the 
same as it would for an (S', G) entry. However, if a match is found, then the 
packet is forwarded on a bidirectional tree; that is, it is sent on all interfaces 
listed except for the interface on which it arrived. In practical terms, this change 
may only require a few bits in the routing entry to specify how forwarding should 
be performed. 

Fig.0 illustrates how packets are forwarded on a multiple-core tree using 
this new type of routing entry. When a source sends a packet to the group, its 
first-hop router encapsulates it and unicasts the packet toward its nearest core. 
When it reaches a router on the core’s bidirectional tree, this router removes the 
original packet and then re-encapsulates it, this time using the core’s address as 
the source address. This packet is multicast along the bidirectional tree until it 
reaches the leaf routers. These routers remove the original packet and deliver it to 
their local members. Note that this process requires packets to be encapsulated 
twice. If a sender is also a member, then only one encapsulation is needed because 
the unicasting step is eliminated. 

The steps for receiving packets depend on whether the Senders-To-All or 
Members- To- All design is being used. For Senders-To-All, a member’s first hop 
router chooses one core and joins the bidirectional tree for that core. For Senders- 
To-All, the first-hop router joins the shared trees for all of the cores. These trees 
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Fig. 4. Delay for Random Core Selection and Nearest Attachment: Senders- To-All 
(left) and Members-To-All (right) 



are joined in the usual manner, that is, by sending a separate join message toward 
the core for each tree. 

3 Simulation Study 

We evaluate the feasibility of multiple-core trees by comparing their performance 
to single-core trees, as has been done in several previous studies comparing tree 
types m and core selection algorithms 0. The factors in our experiment include 
group size (from 5 to 50), core selection (random and dominating set), and core 
attachment (random and nearest). The metrics we evaluate include cost, delay 
and traffic concentration. We report ratios to the corresponding SPT metrics, 
so that we can compare results across different graphs and groups. 



3.1 Experiment Results: Delay 

For these experiments, we use a set of 10 flat, random graphs of 50 nodes each, 
using the Waxman model 0 within the GT-ITM topology generator 0. All 
edges have unit weights and the average node degree is near 4. For each graph, 
we generate 50 random groups and measure the delay experienced by group 
members. We define delay as the number of links traversed between one sender 
and one member in the group. We calculate the maximum delay for each sender, 
then average these numbers to find the average-maximum for the group. 

We find that both Senders- To-All and Members- To-All can significantly re- 
duce the delay experienced by group members when nearest attachment is used. 
As shown in Fig. 0 the delay decreases dramatically as the number of cores 
increases. This is because there is a greater chance that the member (or sender) 
will choose a core that is close to the shortest path. 

Particularly interesting in these results is that, for nearest core attachment, 
most of the benefits are seen with only 5 cores, after which the graphs are mostly 
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Fig. 5. Cost for Senders- To- All with Random Core Selection: Nearest Attachment (left) 
and Random Attachment (right) 

flat. Thus only a small number of nodes - about 10% - need to be used as cores 
for a group. 

Our results also show that the Members- To- All design is tolerant of members 
choosing distant cores: with random attachment delay increases only slightly as 
more cores are added. The Senders-To-All design, on the other hand, can suffer 
from large maximum delays when using random attachment. In the extreme case, 
with both a large group and a large number of cores, there is good chance that 
a member will choose to use some core that is distant from both itself and the 
sender. With the Senders-To-All design, the sender must transmit to all cores, 
so it will use the distant core and incur a large maximum delay. 

3.2 Experiment Results: Cost 

To evaluate cost, we use the same experiment setup as for delay. We define cost 
as the number of links in a tree, representing the bandwidth consumed by one 
packet transmission. We calculate cost separately for each sender, then average 
these costs together to And the average cost for the whole group. Note that for 
Senders-To-All, we count each link for each packet sent; thus if three packets are 
sent to three different cores, it may be possible for one link to be counted three 
times. 

We And that the Members- To- All design performs better than Senders-To-All 
with respect to cost. Sending packets using Members- To- All does not consume 
much more bandwidth than a shortest-path tree. The cost ratio is nearly flat and 
close to 1 for both random and nearest attachment. In fact, the Members- To-All 
design actually performs slightly better than a single-core tree in this regard. 
This is because a sender is able to choose a core whose performance is very close 
to that of a shortest-path tree. 

Senders-To-All also performs well with regard to cost as long as nearest 
attachment is used (see Fig. EJ. The only exception is that the cost increases for 
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Fig. 6. Jf-Center Placement and Nearest Attachment: Delay Ratio (left) and Cost 
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small groups with many cores, due to the number of copies generated on links 
close to the sender. 

Senders- To- All does not perform well with random attachment; as shown in 
Fig. El the cost ratio doubles as the number of cores increases. This happens 
because the sender generates a separate packet for each core and transmits each 
copy on a separate, but possibly overlapping tree. The copies will often travel 
some of the same links, multiplying the amount of bandwidth consumed by the 
sender. 

3.3 Experiment Results: Core Selection 

To examine the effects of core selection, we used both a dominating set algorithm 
and a k-center algorithm. A dominating set is a subset of the nodes in the graph 
such that all nodes are within n hops of this set. A k-center algorithm fixes the 
number of cores to be equal to k, then tries to place them so as to minimize 
the distance from all nodes to the cores. Both finding a minimal dominating set 
and choosing an optimal placement of cores is NP-hard. We use approximation 
algorithms based on node degree, and our experiments indicate these are good 
approximations in random graphs. 

We find that both dominating set placement and fc-center placement improve 
the delay and cost ratios for multiple-core trees. Delay is reduced by 10 to 20% 
for the different designs; the cost ratio is close to 1 already so the improvement 
is slight. Moreover, as the number of cores increases, multiple core trees using 
both random core selection and fc-center placement can have lower delay than 
a single-core tree with optimal core placement! Fig. El shows these results. In 
this same figure we also show the cost ratio for the corresponding experiment. 
Again, AT-center placement is an improvement over random core placement. The 
single-core tree with optimal core placement retains the lone advantage that it 
can have lower cost than a shortest-path tree. This is in keeping with the result 
from Calvert et al. Ej in which optimal core placement is the only placement 
mechanism they study where single-core trees have lower cost than shortest-path 
trees. 
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3.4 Experiment Results: Traffic Concentration 

For traffic concentration, we followed the methodology used by Wei and Estrin. 
For a given graph, we generate 300 groups and construct their multicast trees. 
Then, for each group, we transmit a single packet from each source. As each 
packet is sent, we count the number of times each link in the network is traversed. 
With multiple-core trees, it is possible that a given link is traversed more than 
once by a single packet, due to encapsulation. 

Our results show that as long as random core selection is used, both Members- 
To-All and Senders- To-All do not suffer from traffic concentration, regardless of 
the number of cores. This confirms the results of Calvert et al. 0 indicating 
that traffic concentration is only observed when a small number of cores is used 
across all groups. With random core selection, each group uses a different set of 
cores, so the traffic is spread throughout the network. 

We do observe traffic concentration when using fc-center placement, because 
in this case all groups use the same set of cores. In this situation, the effects of 
traffic concentration appear when there are fewer than 4 cores. As the number 
of cores increases beyond this amount, traffic concentration disappears. 



4 Conclusions and Future Work 

Our results indicate that multiple-core trees are a feasible alternative to shortest- 
path trees. They can have lower delay than trees using a single core and cost 
comparable to shortest-path trees. In addition, multiple-core trees do not suffer 
from traffic concentration, as long as a reasonably large set of candidate cores is 
used. An ISP may also use a /c-center algorithm to choose a static set of cores; 
this can reduce delay and will avoid traffic concentration as long as a large 
enough set is used. 

We are particularly interested in the Senders- To-All variant since it uses 
less router state and provides group members with more flexible control when 
reacting to failed cores and congestion. In most situations, the delay and cost of 
the Senders-To-All design are close to that experienced with shortest-path trees, 
as long as nearest attachment is used. In this case, its primary drawback is its 
higher cost with small groups and large numbers of cores. We are exploring ways 
to reduce cost in this situation. Costs can increase when members choose distant 
cores, but a natural disincentive (higher delay) or policy rules will likely prevent 
this from happening. 

The Members- To-All variant has better performance in terms of cost and 
delay, and is also tolerant of group members choosing distant cores. It is less 
flexible with regard to fault tolerance and uses more router state, but we still 
consider it a viable option for multicast. 

We are continuing to explore the design of multiple-core multicast protocols, 
particularly with respect to single-source multicast. We are also intrigued by 
the possibility of multiple core trees that use core distribution and are actively 
investigating the cost and delay attributes of these structures as well. 
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Abstract. In this paper we study the importance of group characteris- 
tics in multicast communications, and we present a generalized multicast 
routing schema called GMRP (Generalized Multicast Routing Problem). 
The goal of GMRP is to provide an efficient multicast routing approach 
based on the group characteristics. As a case study we considered the 
group dynamism where we distinguish between fixed and dynamic re- 
ceivers, we present an efficient routing algorithm called Maximum Degree 
Minimum Delay Algorithm that takes advantage of this group charac- 
teristic. 

Key words: Multicast routing , group characteristics, QoS, minimum 
delay protocol. 



1 Introduction 

Multicast routing has widely contributed in the deployment of multimedia group 
applications on the Internet such as teleconferencing, tele-education and com- 
puter supported collaborative work Such applications often have strin- 

gent quality of service (QoS) requirements. Unfortunately, most current deployed 
multicast protocols are based on the shortest-path algorithms and hence they 
have no QoS capabilities 0. In addition these protocols construct multicast trees 
without caring about the type of the group itself. We believe that efficient trees 
may be obtained only if we consider the different characteristics of the group. In 
current multicast protocols, the only group characteristic that is considered as 
a major factor is the dispersion of the group members, as a result, two modes 
are defined: the Sparse mode for groups with few member separated by large 
WANs and the dense mode for groups with lots of receivers concentrated in 
the same area. Different protocols are used in the two modes to allow an effi- 
cient multicast routing jS). In the same way, many other group characteristics 
should be considered in the multicast routing approach such as the group dy- 
namism (fixed/dynamic members), the way members join the group (join-only, 
join-leave) the size of the group and its composition (source-only,source-receiver, 
receiver), the periodicity of the group (permanent, periodic, temporary) and 
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QoS requirements of the group (delay constraint, bandwidth constraint, homo- 
geneous/heterogeneous QoS requirements for different members). 

In this paper we propose to generalize the multicast routing problem to 
include different multicast group characteristics in the routing scheme. We call 
this problem the Generalized Multicast Routing Problem. 

To prove the importance of group characteristics in multicast routing, we 
present a case study of the group dynamism where we distinguish between fixed 
and dynamic receivers, we present an appropriate algorithm called Maximum 
Degree Minimum Delay Algorithm which is a greedy algorithm with a cost func- 
tion that combines link costs and nodes degrees. By simulating this algorithm 
we show that we can take advantage of this group characteristic to provide more 
efficient routing algorithms. We believe that this should be true for any other 
group characteristic. 

The rest of this paper is organized as follows: in section 2 we define the Gen- 
eralized Multicast Routing Problem by presenting the network model, multicast 
group characteristics and a formulation of the GMRP. In section 3 we discuss a 
case study of the GMRP by taking the group dynamism as an example of impor- 
tant group characteristic, we present an appropriate routing algorithm, called 
Maximum Degree Minimum Delay Algorithm (MDMDA), for building efficient 
multicast source trees in this case. In section 4 we describe the simulation model 
that we use to compare multicast routing algorithms and we discuss the results 
we obtained for our routing algorithm. This paper is concluded in section 5. 



2 GMRP: The Generalized Multicast Routing Problem 

In this section we present the different component of the Generalized Multi- 
cast Routing Problem. We first describe the network model and different link 
and node parameters. We then enumerate different multicast group characteris- 
tics and we describe how they influence the efficiency of the multicast routing 
schema. Finally we give a general definition of the Generalized Multicast Routing 
Problem. 



2.1 Network Model 

A network is modeled by a connected graph N = (V,E), where V is a set of 
nodes and E is a set of directed links. To each link e = (u,v) G E we associate 
three metrics: a cost Cg a Delay dg and an available bandwidth bg. The link cost 
may be hop count, a monetary cost or any other cost function. The link delay 
includes queuing, transmission and propagation delays of a packet. For a path 
P{u,v) the cost Cp(^u,v) is the sum of costs of the path links, the delay 
is the sum of delays of the path links and the available bandwidth bp(^u y) is the 
minimum available bandwidth on the path links. 

^P{u,v) ^ ^ 

e^P{u,v) 
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^P{u,v) — ^ ^ 

e^P{u,v) 

bp(u,v) = min be 
e^P{u,v) 

A Multicast Tree T(G) is a tree spanning all members of the multicast group 
G. The set of sources S of the group G is not necessarily included in the tree, a 
source may not be a member of the group. During the joining phase a receiver 
r may specify a subset Si of S from which he will receive multicast traffic. The 
total cost of the tree T(G) is the sum of costs of links in the tree. The delay for 
the receiver r is the maximum delay on paths P(si, r) with Si G Si. The available 
bandwidth used by the receiver r is the minimum available bandwidth on paths 
P(si,r) with Si G Si. The total available bandwidth on the Shared multicast 
Tree T(G) is the minimum available bandwidth between a source and a receiver. 

The delay diameter of the tree T(G) is the maximum delay between a source 
and a member of the group. 

Diam{T{G)) = inax ^ 

In section 3 and 4 we consider only source based trees to simplify our case 
study and simulation results but all what we present can be extended to shared 
trees. 



2.2 Group Characteristics 

A multicast group G is a set of nodes participating in the same multicast session, 
we note G = gi,g 2 , ■■■,gn C V with n = |G| < \V\. G is identified by a unique 
class D address. In current deployed multicast protocols we can distinguish two 
types of routing depending on group modes: Dense and Sparse, these two modes 
are based on some group characteristics such as dispersion of members and the 
size of the group. In the following we discuss more characteristics that influence 
the multicast routing and that should be introduced in the multicast routing 
problem. 

Size of the group : This is an important characteristic of a multicast group, 
the number of sources number receivers, number of source/receiver members 
are important factors in the routing schema. 

Group Dynamism : In many current multicast groups, receivers may be Fixed 
(known, subscribed,) or Dynamic (visitors). This is the case for example in a 
tele-education session, where the audience may consist of subscribed students 
of the class (Fixed receivers) and other visitors (Dynamic receivers) who can 
occasionally join the group. We will discuss in details the influence of such 
distinction between fixed and dynamic receivers in section 3. 
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The Type of Session : A multicast session may be a Join only or Join/Leave 
session, in ^ the author studied the greedy and the Naive algorithms in 
different session types. 

Duration of the group : A multicast group may be permanent, periodic or 
temporary group. In each case a different kind of routing should be applied to 
take advantage of this group characteristic and so to offer an efficient routing 
algorithm, for example in the case of periodic groups a prior calculation of 
routes as well as prior reservation may be a good routing choice. In the case 
of permanent groups with a majority of fixed members we can apply static 
routing algorithms with periodic tree update to keep the efficiency of the 
tree within a certain level. 

QoS requirements of the group : Depending on the multicast application, 
the group may have stringent QoS requirement for different parameters such 
as end-to-end delay, bandwidth, loss rate and jitter. Members of the same 
group may have homogeneous or heterogeneous QoS requirement. These 
group specificities should be considered during path calculation. 

In table 0 we summarize the above characteristics of a multicast group, we 
note that this list may be completed by many other group characteristics that 
may also influence the routing mechanism. We cite here only important ones. 



Table 1. Group characteristics 



Group characteristics 


Definitions 


Group size 


Number of sources, receivers, source-receivers. 


Group dynamism 


% of fixed members, % of dynamic members 


Session type 


join only session, join/leave session 


Group duration 


permanent, periodic, temporary. 


QoS requirement 


no requirement, delay-bandwidth constraints, ... 



2.3 GMRP Defiuitiou 

The Generalized Multicast Routing Problem, GMRP, should be seen as a gener- 
alization of any multicast routing schema. Given a network within the description 
given in paragraph 2.1, and a multicast group with specific characteristics as de- 
scribed in paragraph 2.2, the GMRP consists on building an efficient tree for 
such a group. The efficiency of the tree depends on the group characteristics and 
on which routing parameters we want to optimize. An efficient tree may be a tree 
that accepts a maximum number of receivers within a certain end-to-end delay 
constraint, it may also be a tree with minimum total cost. It would be rather 
impossible to have one generic routing algorithm that produces efficient trees for 
all types of groups, but our goal is to build a set of generic routing mechanisms 
that may cover most current group models. Figure [D presents a general view of 
the GMRP. 
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Group characteristics Routing mechanisms Efficient muiticast trees 




Fig. 1. the Generalized Multicast Routing Problem 



3 Case Study: GMRP and Group Dynamism 

In this section we discuss the influence of group dynamism in multicast routing. 
We first formulate group dynamism and then we present an efficient algorithm 
that takes advantage of this group characteristic to build the multicast tree. Our 
algorithm is based on the known greedy algorithm || . 



3.1 Group Dynamism 

In many multicast groups we have two types of members: known members (in 
most cases all sources and a set of receivers) and unknown (dynamic) members 
who can freely join the group at any time. To formulate this important group 
characteristic we propose the following notation for a multicast group G: 

G = S' U SD U DD with: 

G : The multicast Group. 

S : The set of Sources of G. 

SD: The set of Static Destinations (known, fixed receivers) of G. 

DD: The set of Dynamic Destinations (unknown receivers) of G 



3.2 GMRP and Group Dynamism 

Given a network N = (V, E) which have the proprieties given in paragraph 2.1, 
and a multicast group G with a group dynamism characteristics as described in 
paragraph 3.1, the GMRP consists on constructing an efficient tree for such a 
group. In this case the efficiency of the tree may be measured by the number 
of dynamic destinations that the tree may accept under a certain delay or cost 
constraint. The general goal will be to construct a tree that connects the max- 
imum number of destinations when minimizing the total cost and the average 
end-to-end delay of the tree. 

We propose to split the GMRP into two sub-problems: The first sub-problem 
consists on constructing a Partial Static Tree {PST) that connects only the 
Sources and the Static Destinations of the group. The second sub-problem con- 
sists on connecting dynamically the set of Dynamic Destinations one by one in 
their order of arrival. 
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Constructing the Partial Static Tree : The Partial Static Tree should be con- 
structed in a way that a maximum set of Dynamic receivers could join the tree 
later on with a minimum cost and a minimum end-to-end delay. The Partial 
Static Tree should not be optimal itself but should lead to an optimal final tree 
after all group members (static and dynamic) have joined the group. 

Dynamic join of Dynamic Destinations : After constructing the Partial Static 
Tree, members of the Dynamic Destinations set should be added one by one in 
the order of their arrival, the join mechanisms should lead to an efficient final 
tree. 



3.3 The Maximum Degree Minimum Delay Algorithm: MDMDA 

To construct the Partial Static Tree, we propose to use the Static Greedy Al- 
gorithm 1 ^ together with the following specific cost function: For each link 
e = (u,v) G E we associate the cost COST{e) with: 

COST{e) = 4^ 

dcgy 

Where dg is the delay of the link e and deg^ is the degree of node v, the 
incidence node of the link e. This cost function leads to the construction of 
a Partial Static Tree with high degree nodes while maintaining a reasonable 
end-to-end delay and a reasonable total cost. Such tree is efficient to add other 
dynamic receivers with minimum cost. 

To add the dynamic destinations to the Partial Static Tree we propose to 
use the Dynamic Greedy Algorithm | 0 | together with the delay de as the cost 
function associated with each link e = {u, v) G E. 

Figure El shows the importance of having a Partial Static Tree with high 
degree nodes. Here the PSTl, witch has higher average degree nodes, can accept 
dynamic nodes 2 and 5 with less cost than that APS2 does. Even though the 
cost of APSl is higher than the cost of APS2, the final tree produced by APSl 
after adding dynamic destinations 2 and 5 will have a better cost. 




Source 

Static Destination 
(j) Dynamic Destination 
APS 



Fig. 2. The Maximum Degree Minimum Delay Algorithm: example 
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To simplify the algorithm presentation we will take the case of source-base 
trees. Table 0 presents the Maximum Degree Minimum Delay Algorithm (MD- 
MDA ) in this case. 

Table 2. Maximum Degree Minimum Delay Algorithm (MDMDA) 

Maximum Degree Minimum Delay Algorithm (MDMDA) 

1. START from the source. 

2. REPEAT 

i. SELECT among non-connected Static Receivers in SD the closest 
to the current tree, using the new cost function COST{). 

ii. JOIN the selected Static Receiver to the current tree. 

UNTIL All Static Receivers in SD are connected to the Partial Static Tree. 

3. REPEAT for each Dynamic Receiver in DD 

i. JOIN this Dynamic Receiver to the current tree using the delay as 
cost function. 



4 Simulations 

In this section, we provide an overview of our simulation model and some of 
the results we obtained by comparing the Maximum Degree Minimum Delay 
Algorithm (MDMDA) with the classical greedy algorithm |0| (TM) and the 
Reverse Shortest Path algorithm (RSP) used in DVMRP. 



4.1 Simulation Model 

In these simulations we look for the effect of the network size and the node 
average degree on the end-to-end delay and the total cost for a given multicast 
group. We Compare Maximum Degree Minimum Delay Algorithm (MDMDA) 
with the classical greedy algorithm (TM) where the static version is applied for 
fixed members and the dynamic version is applied for dynamic members. We also 
compare MDMDA algorithm with the Reverse Shortest Path algorithm (RSP). 

Simulations are carried over a set of Random Euclidean graphs generated 
using a modified version of Waxman algorithm m proposed by SALAMA in |S|. 
A Random Euclidean graph is generated by distributing nodes on an Euclidean 
plane uniformly and adding edges between nodes on a probalistic basis. We used 
graphs with 50 to 300 nodes and with average degree varying from 3 to 6, The 
default average degree is 4. 
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We used the Bi-model distribution to assign delays to edges. We assign heigh 
delay values uniformly distributed within [90ms, 100ms] to 20% of links and to 
all the rest we assign delay values uniformly distributed between [1ms, 10ms] . 
In 13 the author suggested that Internet traffic load is skewed, with most links 
underutilized and a few links heavily congested, it seems logic that this can 
be expressed in terms of link delay, since in a highly loaded link we possibly 
experience a high delay. 

In our simulations the multicast group represents 80% of the total size of 
the network and it is formed by 20% of static receivers and 80% of dynamic 
receivers. 

Each value in our simulation is an average over 200 iterations on different 
generated graphs, this produces a satisfactory confidence level for our experi- 
ments. 



4.2 Results 

In the first series (figure E) we compare the average end-to-end delay and the 
total tree cost while varying the network size from 50 to 300 nodes. Our algorithm 
MDMDA provide trees with total cost lightly smaller than both TM and RSP, 
but the more interesting gain is the average end-to-end delay. For example, with 
a 200 nodes network, MDMDA performs an end-to-end delay which is less by 
10% than that performed by TM and by 60% than that performed by RSP. 




Fig. 3. Average end-to-end delay and Total tree cost as a function of the network size 



In the second series (figure 0) we compare the average end-to-end delay and 
the total tree cost while varying the average node degree for networks of 100 
nodes. For average node degrees varying between 3 and 6, The Maximum Degree 
Minimum Delay Algorithm (MDMDA) constructs more efficient trees compared 
to the greedy algorithm (TM) and the Reverse Shortest Path algorithm (RSP). 
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Fig. 4. Average end-to-end delay and Total tree cost as a function of the average node 
degree 



5 Conclusion and Future Work 

In this paper we formulated a generalized multicast routing schema called GMRP 
(Generalized Multicast Routing Problem). GMRP aims to provide an efficient 
routing mechanisms based on the multicast group characteristics. To prove the 
importance of group characteristics in multicast routing, we presented a case 
study of the group dynamism where we distinguish between fixed and dynamic 
receivers, we presented an appropriate algorithm called Maximum Degree Min- 
imum Delay Algorithm which is a greedy algorithm with a cost function that 
combines link costs and nodes degrees. Simulation shows that MDMDA over- 
performs classical Greedy algorithm and Reverse Shortest Path Algorithm. More 
work should be carried to study the influence of other group characteristics on 
multicast routing, and to regroup all these characteristics and routing mecha- 
nisms to have a complete image of the Generalized Multicast Routing Problem. 
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Abstract. Some type of transmission rate control is required in order to support 
multicast video applications over the Internet. Previously, we proposed a new 
protocol, the Layered Multicast Control Protocol (LMCP), which utilizes both 
the video sender and the receivers to control the rate of the video transmission. 
One weakness of this approach is that feedback from all receivers is required in 
order for the video source to determine an optimal transmission rate. 

In this paper we introduce an algorithm that allows us to determine the video 
source’s transmission rates based on feedback from a subset or sample of the 
receiver’s feedback. As our analysis shows this algorithm allows us to support 
hundreds of receivers and allows the sender to determine transmission rates, 
which are nearly optimal. Based on the distributions analyzed, we were able to 
calculate transmission rates that achieved a displayable video rate within 5% of 
the optimal setting for 500 receivers with a worse case feedback rate at the 
source of 2kbps. 



1 Introduction 

As the Internet’s networking capabilities have increased, there has been a push to 
utilize these capabilities for non-traditional applications. One of these non-traditional 
applications is multicast videoconferencing. A multicast videoconferencing 
application consists of a sender, transmitting video to a heterogeneous group of 
receivers. In a distributed environment such as the Internet, it is very difficult for this 
type of application to determine the availability of network resources. To perform 
this function some type of monitoring and control process is necessary. This process 
is responsible for dynamically picking transmission rates that best meets the changing 
needs of the set of receivers while taking into account competing applications. To 
simplify our discussion, we are assuming that the multicast videoconferencing traffic 
is separated from normal (TCP/UDP) traffic. This may be done by utilizing a 
separate queue (DIFFSERV) [1] in the routers for video traffic. 

One of the difficulties in multicasting a video stream over the Internet is determining 
the rate of transmission that best meets the receivers’ requirements. Previous rate- 
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control can be divided into two types. The first is a sender based rate-adaptation 
approach [2, 3] where the sender multicasts a single video signal and adjusts its 
transmission rate based on feedback from the receivers. This approach works well if 
all receivers have similar network resources, but it performs poorly if the receivers 
have large differences in their available network resources. 

The second approach is a receiver based rate-control protocol [4, 5, 6, 7, 8]. In 
this solution, the single video stream is split into multiple segments in order to 
transmit the stream across multiple multicast groups. The receivers are then 
responsible for adding and dropping the multicast channels to best meet their 
available resources. The sender is not dynamic and transmits the video stream at pre- 
determined (fixed) rates. While this approach may better meet the diverse 
requirements of the receivers, due to its lack of sender rate-adaptation, it will not 
maximize the utilization of the receivers’ bandwidth. An example of this approach is 
the Receiver-Driven Layered Multicast protocol (RLM) [7]. 



1.1 Layered Multicast Control Protocol (LMCP) 

In [9, 10], we presented a new protocol call the Layered Multicast Control Protocol 
(LMCP) that combines the strengths of these previous two approaches. More 
specifically, this protocol utilizes both the multiple channel transmission concept from 
the receiver-based approach (RLM) and the transmission rate adaptation concept from 
the sender-based approach. The receivers not only perform the basic RLM protocol, 
they also determine their available bandwidth and provide this as feedback to the 
sender. The sender processes the receivers’ feedback to determine the optimal rate of 
transmission for each of the multicast layers. These transmission rates are optimal in 
the sense that they maximize the display able video at the receivers. In [10], we 
analyzed four different maximization algorithms for setting the video source’s 
multicast transmission rates and show that our protocol significantly improves the 
amount of video displayed. In [9], we extend the LCMP by providing a router-based 
technique that allows each receiver to determine its feedback rate based on the 
available bandwidth on the path between the video source and the receiver. We call 
this feedback rate the receiver’s bottleneck rate since it is the minimum (or 
bottleneck) rate on the source-receiver path. 

The LMCP video source’s control algorithm picks transmission rates, which 
maximizes the displayable video for the receivers as an aggregate and not any 
individual receiver. In order to show the effectiveness of this approach in [10], we 
introduced the metric percentage-used. This metric is calculated as the sum of the 
bandwidth-received for all N receivers divided by the sum of the total possible 
received. The higher the percentage-used the better, with a percentage-used of 1 
meaning all receivers are receiving at their maximum rate. The calculation of this 
metric is: 

^{ri-(n-tj)) ^ki 

i—l i-1 

N ~ N (1) 

1=1 /=1 



Total _ received 

Percentage-used = 

Total _ possible 
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Where N is the total number of receivers, rjE R is receiver i’s bottleneck rate and t^e T 
where T is the video sources set of transmission rates and tj is the maximum 
transmission rate such that t < r and k = r - (r - 1). 

One weakness with the LMCP is that as the number of receivers grows into the 
hundreds, the amount of feedback arriving at the sender will increase beyond the 
sender’s ability to receive and process the information. There are two approaches for 
dealing with large amounts of feedback at the video source. First, the control period, 
which is the amount of time between successive runs of the sender’s control 
algorithm, may be lengthened. This will allow the receivers to increase the time they 
wait before transmitting their feedback packet and will decrease the overhead of the 
feedback on the sender. The downside to this approach is that it reduces the 
responsiveness of the control protocol to changes in available bandwidth. The second 
approach is to reduce the number or receivers providing feedback in a control period. 
In this approach, a random subset of the receivers generate and transmit their 
feedback during a control period. The sender uses this subset of feedback in order to 
calculate its transmission rates. In this paper we develop an algorithm for 
implementing this type of approach through statistical sampling and discuss the 
accuracy of this approach. 

The remainder of the paper is organized as follows. In the next section we 
introduce a statistical sampling algorithm for determining the video source’s 
transmission rate. We then provide analytical results of this new approach and our 
conclusions. 



2 Statistical Sampling Algorithm 

As we mentioned earlier, one of the weakness of the LMCP is that as the number of 
receivers grows into the hundreds, the amount of feedback arriving at the sender will 
increase beyond the sender’s ability to receive and process the information. In this 
section we are interested in determining the number of receivers which must be 
sampled in order to allow the video source to determine transmission rates which 
achieve a percentage-used to within an error of 8 and a confidence of 1-a of the true 
percentage-used. By true percentage-used we mean the percentage-used achieved 
when the video source runs its maximization function using feedback from all 
receivers. Stating this formally we are looking for the smallest sample size, n, such 

that P(|/’ - p| > e) < a , where P is the true percentage-used and P is the percentage- 

used achieved using only a subset of the receivers’ feedback. 

One difficulty in calculating this n is the independence of the two variables, kj and 
r, in the calculation of percentage-used. In order to analyze the independence of theis 
two variables we ran multiple simulations using receiver bandwidth distributions 
ranging between highly clustered to randomly distributed. We then calculated the 
correlation coefficient, p, between the two variables, k;(the amount of video displayed 
at receiver i) and r^ (the bottleneck rate on the path between the video source and 
receiver i), used in equation 1. The calculation of p was based on Pearson’s product 
moment correlation coefficient as given in [16]. Our results showed that is a very 
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high correlation between these two variables regardless of the receiver’s bandwidth 
distribution. 

As an alternative to maximizing percentage-used, the sender’s control algorithm 
may minimize the percentage-wasted; where percentage-wasted = 1 - percentage- 
used and is calculated as: 



, Total _Wasted 

Percentage-wasted = 

Total _ possible 



N N 

( 2 ) 

N ~ N 

1=1 1=1 



Where reRis receiver i’s bottle neck rate and IgT is the maximum transmission rate 
such that t < r and Wj = r - 1. 

We may then restate the problem as looking for the smallest n such that 



P( 






> £) < Ct , where W is the true percentage-wasted and W is achieved 



using only a subset of the receivers’ bottleneck rates. Based on our analysis we found 
that the correlation coefficient, p, for the two variables Wj and r is significantly less. 
Therefore, we may assume that the distribution of percentage-wasted is 
asymptotically normal for sufficient size n. 



2.1 Derivation of Our Sample Size n 

In order to save space we have omitted the complete derivation of n, our sample size. 
The complete derivation may be found in the extended version of this paper [18]. Our 
equation for n using W, the percentage-wasted metric is: 



n > 



^ ( f 

7j\-a / 2 






V 
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With variance o* and Or and mean p* and Pr 



(3) 



2.2 Sender’s Algorithm 

Equation 3 allows us to determine «, the sample size needed to archive a W with an 
error of 8 and a confidence level of 1-a. One difficulty with this equation is that it 
requires knowledge about all of receivers. Specifically, we need to know the entire 
set of receivers’ feedback, set R, which is of size N in order to derive our sample size 
n. This means we would need to collect the entire set of receiver’s feedback in order 
to determine how big of a sample we need. To overcome this limitation we have 
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1) Estimate N, the total number of receivers in the videoconference using techniques 
given in [17] 

2) Initiate the process to receive feedback from 5 receivers using techniques given in 
[17]. 

3) Estimate fi , using Equation 3, based on the receivers’ feedback obtained in step 2. 

4) If fi> 5 then set 5 = h . 

5) Utilize feedback to determine the sender’s transmission rates. 

6) Go to step 2. 

Fig. 1. Sender’s Iterative Algorithm to Determine Transmission Rates 



developed an iterative algorithm. This algorithm is shown in Fig. 1. The algorithm is 
iterative in the sense that we continually estimate In based on a subset of the receivers 
feedback rates and adjust our sample size according to this newly calculated value. 
At the same time the sender uses the receivers feedback to calculate its transmission 
rates. 

In order to obtain a subset of the receivers’ bottleneck rates our algorithm utilizes a 
technique developed in [17]. Their approach utilizes probabilistic probing in order to 
estimate the number of receivers in the videoconference. In addition, they provide an 
algorithm to obtain a fixed amount of feedback from a randomly determined subset of 
receivers. We utilize these techniques in steps 1 and 2 in order to obtain the feedback 
necessary to calculate our new transmission rates. 

In step 3 of our algorithm we calculate h utilizing equation 3 and the feedback 
obtained in step 2. For this calculation we are using £ — .05 and OC — .05 . It 
should be noted that this is only an estimate of n since we are utilizing only a subset 
of the receivers in our calculation. While an averaging of the n’s might more 
accurately represent the true n, in order to be conservative in step 4 we only adjust our 
sample size 5 to larger values of n . In Step 5 of our algorithm we utilize the 
feedback to determine the sender’s transmission rates. 



2.3 Real-Time Transport Protocol (RTF) 

An alternative to the algorithm presented in Fig. 1 is the one used by RTF. RTP 
provides for feedback scalability via its RTP Control Protocol (RTCP) [11]. In this 
approach the amount of feedback for all receivers is limited to a predetermined 
“session bandwidth”. In RTCP feedback packets are multicast to all group members, 
both senders and receivers. Multicasting is necessary in order for each receiver to be 
able to determine the current video session size and therefore the amount of 
bandwidth being used by the RTCP. As the number of receivers grows the receivers 
increase the amount of time before sending a feedback packet, which decreases the 
overhead of the feedback packets. 

There are two negatives to this approach. First, all links in the network carrying 
video traffic will experience the overhead of the entire “session bandwidth” amount 
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of traffic since the feedback is multicast to all receivers. In comparison in the LCMP 
approach the source controls the amount of feedback required. This allows our 
algorithm to unicast feedback packets between the receiver and the source. This 
reduces the overhead of feedback packets on the network. Second, the RTP feedback 
approach does not take into account the distribution of the bandwidth at the receivers. 
In the RTCP approach as the number of session grows the amount of time a receiver 
waits to send its feedback increases. This technique does not take into account the 
effect of the reduced feedback on the sender’s control algorithm. In the LMCP 
approach we continually calculate the amount of feedback required allowing us to 
scale the feedback based on the current distribution of the receiver’s bandwidths. 



3 Performance Analysis 

In order to determine the effectiveness of the algorithm given in Fig. 1, we have 
analyzed its performance with hundreds of distributions. The receiver bottleneck rate 
distributions varied from a small number of clusters, to very clustered, to uniformly 
distributed. 

There are two key steps in our algorithm. The first is the calculation of n in 
step 3. Due to the correlation between wi and ri we have found that a fairly large 5, 
our minimum sample size, is necessary in order to achieve a realistic initial n . For 
our analysis we have initially set 5 = 120. Fig. 2 shows histograms of the distribution 
of the receiver’s bottleneck rates used in our first set of simulations. The bottleneck 
rate represents the slowest rate on the path between the source and each receiver. As 
this figures show we looked at receiver distributions ranging from clustered to 
uniform. Fig. 3 shows histograms of the calculated n for 60 iterations of the 
sender’s algorithm for the 6 receiver bottleneck rate distributions shown in Fig. 2. 
The number shown (e.g. true n=?) on each histogram is the true value of n calculated 
using all 500 receivers. As we can see by these histograms, the estimated value n 
tends to be normally distributed around the true n. This is what we would expect and 
means that equation 3 gives us realistic estimates of n for the given distributions. 

In order to understand the overall effectiveness of the algorithm we need to look at 
its effect on the metric percentage-used. Fig. 4 shows the calculated value for the 
metric percentage-used for 50 different receiver bandwidth distributions. This 
percentage-used was based on the transmission rates as determined by our sampling 
algorithm (Fig. 1). This allows us to see how accurately our algorithm performs. The 
50 different receiver bottleneck rate distributions ranged between slightly clustered to 
uniformly distributed. Fig. 4 shows the variations in the metric percentage-used 
using these 50 distributions. The short lines in this graph represents the difference 
between the true or ideal percentage-used achieved using all N receivers’ feedback 
and the worst-case or minimum percentage-used achieved for 60 iterations of our 
algorithm. As you can see from this graph, our algorithm achieved nearly optimal 
performance for all distributions with a maximum variation for any of the 50 
distributions of .05. In addition, our estimated sample size n , never exceeded 200 
for the 50 distributions and was below 100 for the majority of the distributions. 
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Fig. 2. Histograms of the Receivers' Bottleneck Rate Distributions, with N = 500 and 
bottleneck rates in K bits per seconds (kbps) 



4 Summary 

In this paper we address the issue of the scalability of the receivers’ feedback for large 
multicast videoconferences. In our previous work we developed the LMCP to control 
the transmission rate of the video source. One weakness of this approach is that it 
required feedback from all receivers in order to maximize the receivers’ display able 
video. In this paper we introduced an algorithm that utilized a sample of the 
receivers’ feedback in determining the sources’ transmission rate. 

This algorithm first estimates a sample size n such that R(|p- p| > e)<a , where 

P is the true percentage-used and P is calculated based on our sample. The 
algorithm then samples the receivers’ feedback rates and determines the sender’s 

transmission rates. Our analysis showed that this algorithm achieved a P , which 
varied at most .05 for the distributions presented. In addition for the 50 distributions 
presented our maximum feedback rate was 2kbps for 500 receivers 
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Fig 3. Histograms of the calculation of h for the six different receiver bandwidth 
distributions found in Fig. 2. The calculation for h was done with the N = 500, 
e = .05 and a=.05. Each histogram represents 60 iterations of the algorithm. 
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Abstract. In this paper we propose a group communication architec- 
ture for the Internet. This architecture is implemented as a middleware 
which provides a separation between a general group communication ser- 
vice model, and the multicast mechanisms used to implement this service. 
This middleware provides a web-oriented framework for implementing 
end-to-end group communication services as constrained by specific ap- 
plication needs and available network technologies. 



1 Introduction 

Multicast is a means of one-to-many communication. It is essential for scalable 
group communication as it allows a group member to communicate with once 
with an abstract group and yet effectively reach multiple other members of the 
group. The main objective of multicast is to eliminate wasteful duplicates in the 
network when sending the same information to multiple receivers. 

Group applications today operate without multicast by using a repeated 
unicast scheme, in which a data sender opens a separate unicast connection 
to each data receiver. This is expensive to both the sender and the network, 
imposing high a bandwidth-delay product on transmissions at the sender and 
creating as many duplicates of the data sent as there are receivers in the group. 

In the Internet context, multicast research has focused around IP multicast 
m, a network-layer solution for distributing data from a sender to multiple 
receivers. IP multicast ensures that a packet sent to a multicast group will only 
traverse a given physical link in the network at most once, independent of the 
number of receivers of that packet. 

IP multicast has met with only moderate success however, due in part to the 
slow pace of network-level deployment, but more importantly due to difficulties 
related to protocol standardization and concerns about network security in the 
presence of a widely available IP multicast service. 

Recent research efforts including mm have proposed a hybrid multicast 
approach, coined Application Layer Multicast (ALM) which performs multicast 
routing at the application layer. ALM approaches aim to perform multicast 
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distribution between endsystems involved in a group communication using only 
unicast network primitives. 

While the efficiency of ALM protocols is acknowledged as being less than 
that of IP multicast 0, this approach is promising as it approaches IP multicast 
efficiency without the difficulties or dangers of infrastructural deployment. 

One area in which ALM is being successfully applied is in the new field 
of content distribution companies. Relay servers are placed at strategic points 
in the Internet, closer to a clustering of customers than the originating server. 
“Content” (such as streamed video and audio, or proxy data) is relayed by 
unicast to the server, to which clients also connect using unicast. The content 
distribution company is thereby able to reach a much larger number of clients 
at reduced cost to both the company and the network than if each client was 
being served directly from the originating server. 

Multiple relay servers can be connected using unicast tunnels to provide a 
virtual multicast network over the Internet, for which reason ALM is also known 
as overlay multicast. The MBone |H| is one example of an overlay network. 

Many group applications exist today which are forced to use a repeated 
unicast scheme and would benefit greatly from multicast distribution. Common 
examples that we use daily include real-time multimedia streaming, multi-player 
games or forums and audio-visual conferencing/teaching software. 

These diverse applications all use a common group communication service, 
but have very different and often conflicting requirements of this service. Con- 
sider for example a multimedia streaming application which is highly delay sen- 
sitive but which can permit loss, compared to a file-sharing application which 
tolerates some delay but no loss. No single protocol implementing the service 
could meet these diametrically opposed requirements. 

The Internet Multicast Architecture (IMA) proposes an abstract service 
model for group communication in the Internet. It is conceived as a middle- 
ware layer which allows protocols to implement the service as constrained by an 
application’s service requirements and the available network services. 

This allows the development of protocol instances in a consistent group com- 
munication framework. It enables a protocol to fulfill an application’s specialised 
group communication requirements, and to provide this same service in a trans- 
parent way using IP multicast or Application Layer Multicast. 

The rest of this paper describes the IMA, and the separation of group com- 
munication and multicast. Section Q presents the IMA group communication 
service model for the Internet. Section El presents a middleware which imple- 
ments this service. Section E] describes concurrent ALM projects which propose 
an Internet group architecture. Section O concludes with some interesting new 
possibilities created by Application Layer Multicast, and which we are pursuing 
in the framework of the Internet Multicast Architecture. 
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Fig. 1. A service model for group communication 



2 A Group Communication Service Model 

We define group communication as the communication that takes place between 
an individual and some abstract group representing other participants in that 
communication. The implication is that application data sent by one participant 
to the abstract group is distributed to all the other participantsQ 

In the following we define an abstract service model for group communication 
in the Internet. This model describes the IMA framework for group creation, 
group addressing and discovery, and group communication. This service model 
is independent of both the multicast technique (s) used to distribute application 
data between members of the group and the service. These are defined in protocol 
instances which implement the group model. Section 0 describes some protocol 
instances developed for the IMA and the services they provide. 

^ We do not wish to say that all group communication is reliable, as some services like 
IP multicast are best-effort. We mean that some attempt is made to deliver data 
sent by one participant to all the others. 
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The group communication service model defines three group communication 
entities: 

1. A rendezvous or meeting point for group members. The rendezvous pro- 
vides a central point for group management. The rendezvous’s functions in- 
clude group creation and tear down and group access control. The rendezvous 
is not however implicated in data distribution between the members of the 
group. 

2. A member. A group will have many members. A member is a participant 
which can send to, and/or receive from, a group. A member joins a group 
by addressing itself to the rendezvous. If permitted to join the group, it is 
provided with a protocol specific information which allows it to communicate 
directly with the group. 

3. A group. The group is an abstraction of the all other members which have 
joined to the rendezvous. Once a rendezvous has created a group, and a 
member has joined it, the member communicates with the other members 
by way of the abstract group. The specifics of communication with a group 
are left to individual protocol instances. 

The interaction of these entities is shown in Figure QJ 

In Figure ^ ^ rendezvous host opens a group on a local port. The address 
of the rendezvous and the port number define a globally unique address for the 
group. 

A group member joins a group by connecting to a known group address. 
Once accepted, it may exchange datagrams with the abstract group. The actual 
distribution of this application data from a member to the group depends on 
the actual protocol instance implementation. 

A group member leaves the group by re-contacting the rendezvous. Only the 
rendezvous may close the group. 

3 Group Communication Middleware 

The IMA implements the group communication service model of Section |3 as a 
middleware layer, shown in Figure |3 This middleware provides a homogeneous 
group communication service over different available network technologies. This 
is useful in the current Internet situation where group applications are used 
globally but IP multicast is being deployed and activated incrementally. 

The IMA middleware defines a set of abstract group communication primi- 
tives which together make up what we call a GroupSocket. A GroupSocket pro- 
vides the group communication interface between a group member and the rest 
of the group. A Java definition of and abstract Group Socket object is listed in 
Figure 0 

An IMA protocol instance implements this abstract class. An implementation 
is based on the available network technology (Application Layer Multicast, IP 
multicast, or a combination), and the service requirements of a class of group 
applications. The service requirements also define the roles of different group 
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members, and partitions the availability of the GroupSocket primitives accord- 
ingly. We are currently investigating four protocol instances with different service 
requirements and network constraints: 

1. Reliable Streaming : A reliable, one-to-many bulk data transfer service, 
similar to that of the Reliable Multicast Transfer Protocol II 0 . This proto- 
col instance uses an overlay tree of TCP connections to transmit files reliably 
from a single sender to multiple receivers. Its defines two roles: A sender- 
rendezvous which creates and sends to a group, and a receiver which joins 
and passively receives data from a group. 

2. Reliable Messaging : A reliable, many-to-many communication service, 
similar to that of the Scalable Reliable Multicast protocol [Z|. This proto- 
col instance uses an overlay ring structure of TCP connections to provide 
partially-ordered communication between its members. It defines two roles: 
A rendezvous-server which manages the group, and a client which can both 
send and receive packets reliably to the group. 

3. Basic Service over IP multicast : An unreliable datagram single-source 
multicast service. This protocol instance is a lightweight wrapper to IP mul- 
ticast. A source controls access to the group and acts as a sender, and a 
receiver passively receives. 

4. Basic Service over UDP: Another unreliable datagram single-source mul- 
ticast service. This protocol instance however provides the basic service using 
an Application Layer Multicast tree structure and simple (unicast) UDP ser- 
vices. 

The IMA approach enables us to break out of the shortest path tree approach 
imposed by IP multicast. It encompasses ALM and overlay network approaches, 
which are able to use alternative distribution methods, hopefully better suited to 
the targeted application’s communication requirements. For example, we are in- 
vestigating a ring topology for small conferencing applications (see the Reliable 
Messaging protocol instance above), which promises better group performance 
than using an IP multicast tree for each participating member. 
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abstract class GroupSocket { 

// Create an IMA Group on rdvPort . The IMA Group Address 
// of this IMA group will be (localhost : rdvPort) . 
abstract void open (int rdvPort) ; 

//To this IMA group, 
abstract void close () ; 

//To join the parameterized IMA Group 

abstract void join (InetAddress rdvAddress, int rdvPort); 

// Leave the currently joined IMA Group, 
abstract void leave () ; 

// Receive an ADU from the group 
abstract GroupDatagram receive () ; 

// Send an ADU to the group 

abstract void send (GroupDatagram dgram) ; 



} 



Fig. 3. Abstract definition of a GroupSocket as a Java class 



Another important advantage offered by the IMA architecture is the ability 
to implement the same service using different available distribution technologies, 
as seen by the two variants of the Basic Service protocol, one of which uses 
IP multicast, the other Application Layer Multicast. This enables a group ap- 
plication to be provided in all environments, and facilitates a smooth evolution 
process to emerging technologies. 



4 Related Work 

Reliable Multicast ProXies (RMX) PJ splits a large heterogeneous multicast 
group into a number of co-located and homogeneous data groups, each with its 
own RMX. Data sent to the group is distributed between RMX’s using TCP, and 
within a data group using Scalable Reliable Multicast (SRM) |Z| . The principal of 
Application Level Framing s followed, allowing insertion of application semantics 
in an RMX. An application can specialize an RMX in terms of data reliability, 
transmission scheduling and dynamic data transformation. 

RMX defines an overlay network architecture which requires the placement 
of third-party servers in the network, while the IMA is designed to be able to 
provide group communication without any network support, even between end- 
systems. 
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Nevertheless, the IMA is able to support a two-tier approach as proposed 
by RMX. We propose inter-domain unicast tunnels with an IP multicast-based 
protocol in local subnets as a solution to inter-ISP multicast traffic. This is 
described in m- 

Narada [3 presents an excellent case for Application Layer Multicast, which 
they refer to as end-system multicast. The arguments presented in this paper 
are perfectly valid in the context of our work on the IMA. P| does not present 
a multicast architecture, but is rather an investigation into the performance of 
Application Layer Multicast compared to IP multicast protocols. 

In relation to the IMA, Narada presents a protocol instance providing a 
many-to-many reliable service. Narada implements this service by creating a 
mesh overlay structure between members and running the IP multicast Distance 
Vector Multicast Routing Protocol |0| to create a multicast distribution trees for 
each sender in the group. 

5 Conclusions 

Research into group communication has for a long time been neglected in favor of 
multicast technology. Now that the advantages of both network and application- 
layer multicast approaches have become recognized, a renewed look at group 
communication - independent of multicast technology - is required. 

In this paper we define group communication as an abstract service which 
can be separated from the multicast technology used to implement it. We wish 
to make it clear that multicast is simply a supporting technology, and that the 
group application service can in fact be provided without this technology. 

In Section |21 we propose a group communication service model for the In- 
ternet. This service definition hides the multicast (or unicast) technology used 
to implement a group communication service, which means that group applica- 
tions can evolve to new technologies (such as IP multicast) without requiring 
application re-engineering. For the same reason, it makes it possible to support 
a homogeneous group communication service irrespective of the underlying net- 
work. 

In Section 01 we present the Internet Multicast Architecture, a unifying ar- 
chitecture which implements the group model of Section El as a middleware 
group communications layer. The main element of this architecture is the Group 
Socket class, which defines a standard interface to group communication pro- 
tocols whether they distribute data using “traditional” IP multicast, or an Ap- 
plication Layer Multicast approach. 

The IMA provides the opportunity to develop protocol instances as a func- 
tion of the group application needs and the available distribution technology 
(multicast or unicast). This enables protocols to be developed to best meet the 
application needs, which is not possible when only IP multicast is used. The 
IMA also requires no network deployment, and can at its simplest operate di- 
rectly between endsystems. 
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6 Future Directions 

The Application Layer Multicast approach opens some other interesting avenues 
of research which we are exploring. 

Experience has shown that the separation of multicast distribution and end- 
to-end services at the network-endsystem boundary makes the provision of group- 
wide services such as reliability and flow control difficult . ALM on the other hand 
allows the close integration of end-to-end services with the multicast distribution. 
We are leveraging this close association to develop a working reliable multicast 
protocol instance for general use in the Internet, an as yet unresolved academic 
problem. 

Another interest aspect of Application Layer Multicast is the possibility to 
use adaptive routing of data between members of a group. This promises better 
congestion avoidance and network load balancing in reliable multicast groups. 
A future paper m presents some promising results for this approach. 

Application Layer Multicast also provides us with the freedom to use other 
distribution structures than the multicast tree imposed by IP multicast. sug- 
gests that a ring structure can be suitable to reliable communications in some 
situations than a tree. We are part of an international cooperation developing 
Internet group applications and to investigate different distribution structures 
for group communications. 
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Abstract. The Routing Information Protocol (RIP) may occasionally introduce 
misleading routing information into the routing table, due to network topology 
changes such as link or router failures. This is known as the “counting to 
infinity” problem. In the past, the distance metric had to be below 16 hops, in 
order to keep this counting within reasonable limits. In this paper a more 
elaborate approach is presented in order to recognize those router interfaces, 
which might have received misleading routing messages. This is accomplished 
by evaluating routing updates more carefully than is done by the well known 
split horizon approach. In contrast to other approaches, the router interfaces are 
examined in pairs to determine if a loop exists between them. The algorithm 
locally extracts all the information it needs from the normal update messages 
that are exchanged between RIP neighbors and is thus executed in constant 
time. Only some minor calculations have to be carried out to gain the 
knowledge that is necessary to recognize those interfaces which may have 
received misleading routing information. Hence, this distance vector routing 
without “counting to infinity” can be used in complex networking 
environments. 

Keywords: Routing Information Protocol, Internet, Protocol Design, Counting 
to Infinity 



1 Introduction 

The major advantages of distance vector routing (DVR) are minimal exchanges of 
routing messages, minimal administration, and minimal memory and processing 
requirements. DVR is therefore still widely used in spite of its known weakness. 
Namely, that it may suffer severely from network component failures due to the 
“counting to infinity” approach. 

This paper analyzes the preconditions of routing loops. It examines the situations 
which are prone to routing loops, shows how to avoid routing loops and proves the 
acquired solution. Simulation results are presented to give a comparative insight into 
the behavior of conventional RIP and the newly developed Routing Information 
Protocol with Minimal Topology Information (RIP-MTI). 
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2 Distance Vector Routing 

DVR protocols exchange very little information among neighbor routers. These 
“route advertisements” consist of destination and metric, where the metric is an 
integer number corresponding to the actual or prescribed distance to the specified 
destination network. 

In this paper only one destination network, called d, is regarded at a time. This 
convention is used throughout (see fig. 1). 

Routing is understood as the process of providing routing information for routing 
tables, whereas forwarding is the delivering of packets by using the routing table 
information. In most DVR protocols the routing table contains the cost of the shortest 
path and the next router address in the path to the destination network. Routers 
directly connected to destination d know the link cost, and send this information to 
their neighbors. Other routers compute the shortest path to d as the minimum of the 
sum of the information received from their neighbors and the link costs to these 
neighbors. 

Within RIP, all routers know the destination addresses to their neighboring 
networks and send them to their neighboring routers with a metric of 1 . These routers 
themselves send the received destination address with metric 2 to their neighboring 
routers. This process is repeated until all routers within the autonomous system (AS, 
region of the Internet under the administrative control of a single entity) know all 
destination addresses and are provided with reachability information for all 
destinations within the AS. 

network 

destination network 
distance metrics 

router with distance 2 to the destination network d 
source router with distance 1 to the destination network d 

sending router 
route advertisement 

forwarding direction 
Fig. 1. Legend for the network/router diagrams 

In order to calculate the shortest link to the destination network, only the smallest 
metric is stored in the routing tables. Routers send routing updates periodically (every 
30 seconds) reporting all the routes learned from their neighbor routers. If a route is 
not advertised for three minutes or more, this route is marked as “unreachable” by 
assigning the “infinity” metric (16) to it. If, however, other routers are still advertising 
a route to this destination, the best among them is selected. Fig. 2 shows an example 
of how routers exchange routing updates. 

What are the disadvantages of RIP? RIP is able to produce the right routing table 
entries: it reacts quickly to good (i.e. shorter routes) but incorrect news, and very 
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slowly to bad (i.e. longer routes) but correct news [13]. The main problem is that 
“routing loops” can arise with reachability information circling inside them (fig. 3). 

How do routing loops arise? As soon as a router recognizes that a certain 
destination network is unreachable, it propagates this information throughout the 
system via neighbor to neighbor propagation. During this period, one of the routers 
inside of a loop, which has not yet received the “network unreachable” information, 
sends a wrong but better than infinity update message to its neighboring routers. 
Because this pretended “good news” is considered more valuable than “bad news” all 
routers will accept and propagate this “pseudo good news”, which may finally lead to 
counting to infinity if this “good” news is propagated further inside the routing loop. 

The lifetime of these misleading information in routing tables can be limited if a 
low maximum value for the longest path in an autonomous system is given [2]. RIP 
thus only allows 15 routers within one route leading to any destination. If the number 
of routers on a certain path exceeds 15, the advertised destination is considered to be 
unreachable. 

In fig. 3, three routers i, rl and r2 are connecting four networks. Fig. 2 shows the 
same system, however in a state where no counting to infinity has occurred, due to the 
fact that the reachability to all networks is given at a distance below “infinity”. In fig. 
3, it is shown what happens if network d suddenly becomes unreachable. 
Unfortunately, the routers’ reactions contradict one another. Router i sends this “bad 
but right” news to the other routers, r2 advertises that a route to network d still exists 
with a metric of 2 which is a “good but wrong” news. Router rl accepts this route to 
destination d, because its metric (2) is smaller than “infinity” and assumes that this 
route really exits. Router rl assigns 3 (2+1) as the cost and r2 as the direction to 
network d. Router rl sends this information to router i in fig. 3 (c). 




Fig. 2. Building up complete routing information for destination d (“cold start”) 




Fig. S.Routing loop after destination d becomes unreachable 
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Router i wrongly assumes that the only possible route to reach the destination d is 
via rl. Router i again wrongly advertises the new route to router r2 in fig. 3 (d). The 
metric for destination d in the routing table of router r2 is seen as not up-to-date, 
therefore, router r2 wrongly corrects the metric for this route to 5. This procedure 
goes on within this routing loop until the metric reaches “infinity” in all routing 
tables, which makes RIP very inefficient in this case. By applying some heuristics, 
which will be explained below, this inefficiency of RIP can be overcome to a certain 
degree. 



2.1 Improving Convergence 

To improve the convergence of RIP, four concepts have been suggested: 

- triggered updates 

- split horizon 

- poison reverse 

- path hold-down 

These concepts have been integrated into different RIP implementations, but they 
do not solve the “counting to infinity” problem completely. 

Triggered updates are sent immediately after any change of a routing table entry. 
This was proposed to spread changes of topology faster than before. Routers do not 
have to wait until the next regular update. The main disadvantage of triggered update 
is that the routing traffic rises strongly after a failure of a network component. To 
reduce this effect, routers are forced to wait a random length of time before they send 
a triggered update. 

Split horizon avoids some of the incorrect routing advertisements: 

“A router does not send outgoing route advertisements back to the router from which 
they were learned.” 

This approach reduces the probability of routing loops. It does not, however, avoid 
them. In any routing loop, there are destination networks, which are advertised with 
the same metric via two or more interfaces of a router. If a router decides to accept 
one of its neighbor’s routes as the best route, and sends this wrong but perceived good 
routing update to other routers, a routing loop may be created. 

Path hold-down establishes a period of time (typically 60 sec.) during which a router 
will ignore new routing information about a given network, once the router has 
learned that this network is unreachable, i.e. has been assigned the metric “infinity”. 
According to RFC 1009 [1, p24], a hold-down period is chosen long enough to allow 
for the unreachable status to propagate to all routers in the autonomous system (AS). 
The avoidance of additional network load is considered more important than fast 
route re-establishment in case of a failure. Therefore, this is the slowest approach in 
the group, but it should avoid most routing loops. The earlier releases of IGRP used a 
path hold-down technique, the latest releases use poison reverse only [8, p 117]. 
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2.2 Other Approaches 

One way of avoiding the counting to infinity problem is to recognize routing loops by 
gaining some knowledge about the path back to the source router. Cheng, et. al. [2] 
and Rajapopalan and Faiman [7] suggested appending the router nearest to the 
destination (which is called the head of path) to the routing update. With this 
information, it is possible to trace back the path from the destination to the source 
router. This back-tracing, which needs additional exchange of information between 
routers, allows the recognition of routing loops within the path. 

Including the router-label nearest to the neighbor router in the update message in 
order to obtain a three node path knowledge was proposed by Shin and Chen [12] to 
avoid two-node looping. This algorithm can be extended to a k-th order algorithm 
which avoids all loops with more than k hops. However, this leads to an increase in 
the size of update messages and the local memory requirement increases in proportion 
to k. A DVR protocol developed for routing between AS-domains, called the Border 
Gateway Protocol (BGP), specifies the entire path from source to destination in the 
update message. 

Due to the fact that a routing loop needs certain timing conditions to develop, 
routing loops can be avoided by coordinating the exchange of routing information. 
Jaffe and Moss [5] showed that no routing loops can occur in DVR algorithms after a 
link addition or a link-cost decrease. Their protocol requires inter-nodal coordination 
if link costs increase or resources fail in the network. Garcia-Luna-Aceves proposed 
the Diffusing Update Algorithm (DUAL) [3,4], where the routers coordinate 
themselves mutually by confirmation messages. With these messages, a router can 
find out if the routing update procedure it has initiated to all its neighbors has 
terminated already. If not, the router is blocked: it accepts no further update messages, 
thereby avoiding the creation of routing loops. The implementation of this method is 
extensive. It cannot be carried out as an add-on to RIP. According to Garcia-Luna- 
Aceves [2,4] there are three feasibility conditions, whereby one of each guarantees 
freedom of routing loops (Distance Increase Condition DIG, Current Successor 
condition CSS, and Source router Node Condition SNC). It should be noted that these 
conditions guarantee the absence of routing loops, however, they are not able to 
calculate shortest paths. DUAL is used by Cisco, a router hardware vendor, under the 
name EIGRP (Extended Interior Gateway Routing Protocol). A coordinated diffusing 
computation is initiated only when distances increase. 

In summary, it can be said that the alienation to RIP in the Internet community can 
be traced back to the unsolved problems with “counting to infinity” and the resulting 
slow convergence of RIP. All the aforementioned solutions require the extension of 
routing messages in order to function properly. 



3 The RIP-MTI Approach^] 

In order to avoid routing loops locally without any change of RIP messages, RIP’s 
behavior of forgetting all but the shortest path is changed. “Minimal Topology 
Information” (MTI) can be derived from the information provided by the other paths. 



see [9,10,11] 



1 
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The RIP-MTI approach is based on two simple ideas: 

• router-local recognition of routing loops between all interfaces of single 
routers 

• router-local recognition of source loops by the conditions given later in the 
theorems 4, 5 and 6 

Source loops are loops which pass through two source router interfaces more than 
once on their way through the networks and routers (see fig. 5). 

In two local tables on each router, the existence of loops between a pair of 
interfaces and the network environment are stored. A loop is recognized by examining 
reachability information for the same destination network, found on two different 
router interfaces. Thus, a loop can be recognized by two routing updates which 
contain the same destination network and arrive over different interfaces of the same 
router. Important: the loop information stored consists of the beginning and the 
ending router interface and the distance metric for the loop. 

This additional loop information is exactly what is needed to be able to decide 
whether a destination network d can be reached through more than one interface. This 
is the fundamental idea of MTI . 

The RIP-MTI algorithm also makes use of the other concepts for suppressing 
“counting to infinity”. RIP-MTI routers must use split horizon and triggered updates, 
may use poison reverse, and of course must use MTI, which can be seen as a very fast 
conditional path hold down for wrong alternative routing updates. The further 
explanation is based on link costs of 1 - it will be shown in part III. E how a link 
metric with arbitrary costs can be used as well. 

Loops may have two meanings for RIP-MTI. First, they may be used to offer 
alternative reachabilites to a destination network if they do not pass any router more 
than once and one path to the destination network fails, which is referred to as “cycle” 
later on. Second, they may be memories for wrong reachability information if they 
pass a router more than once. This is referred to as “source loop” later on. 

Fig. 4 depicts the behavior of router i knowing that there has been no cycle 
between interface A (connects destination network d) and interface C (connects 
network which contains router rl). Router i rejects the wrong alternative routing 
update from interface C because it detects a source loop. 




Fig. 4.Routing loop avoided after destination d becomes unreachable. 
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3.1 Definitions 

Definition 1 (Path): A path from router i = to subnet d = is abbreviated as P‘‘‘ 
and is a sequence of routers and subnets: 

P‘’‘^ =(b =i,Sy,r^,S2,...,r^,Sj^=d) 

L is the total length of the path. A path from i to d over interface A is abbreviated as 
P ‘jf and a path that returns back to router i over interface B as . 

Definition 2 (Route): A route from router i to subnet d over interface A is 
defined as = L , when a path P^’'^ = (q = i, Sj = A, = d) exists. 

A routing update < = 1 > from router i received by router r2 over interface A 

• iM P'),d ,1 

IS a route + 1 . 

It should be pointed out, that a path/route is never static. A path/route cannot be 
assembled out of the routing information stored in the routing tables of the routers at 
any particular point in time. Time has a very important role, because it always takes 
some amount of time for a data packet to traverse a given path: during this time, the 
routing table entries can change because of routing updates the routers may have 
received in the meantime. The result is, that the path the message traverses, can 
contain loops, but this is not the case when looking at the routing tables at any single 
point in time. 

A composite back-route between the interfaces A and P of a router via destination 
P is a necessary criterion for a loop between interfaces A and B. In an acyclic network 
topology there are no composite back-routes. 

Definition 3 (Composite back-ronte): A composite back-route between two 

interfaces A and B of router i via a destination d exists, if the destination d is 
advertised as being reachable via interface A and via interface B. Then these two paths 
can be combined into one composite path 

P'ap = (h = h d,..., Sg = B, = i) 

Because a composite back-route is composed of two routes and m‘g‘^ , the 

metric of a composite back-route can be calculated as follows: 

‘/J = - \ = L ■ 

A routing loop is a necessary condition for the “counting to infinity” behavior. 

Definition 4 (Routing loop): A routing loop is a special route m'f (or a special 

composite back-route m‘]^g ) which passes through any router only once in its 
corresponding path. 



The source loop is a key concept of this work. 
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Definition 5 (Source loop): A route (or a composite back-route is a 

source loop if it passes the source router i (which receives the route advertisements) 
more than once. (In fig. 4c router i recognizes that is a source loop) 

The following definition of a cycle is temporary in the sense that it specifies a 
cycle but can be reduced later to the minimal cycle. 

Definition 6 (Cycle): A cycle cyc^g between two interfaces A and B via the 

destination is a special composite back-route between A and B via the 

destination d, whereby the source router must not be traversed along the route. (In fig. 
4 there is a cycle between interface B and C.) 

It is possible to define the "smallest metric" of a loop independent of the 
destination d as a. representative of all loops between a pair of interfaces: 

Definition 7 (Minimal cycle): The minimal cycle between two interfaces A and B is 
mincyc\ g= mm{cyc'f 'B fof all destinations d } 

The notion of the minimal return route via an interface A is important for further 
explanations. It is also part of the sufficient cycle criterion (refer to the corollary 
theorem 6). 

Definition 8 (Minimal return route): The minimal return route via an interface A is 
minm\ = rava^incyc\ ^ for all interfaces S of fj 

It is crucial for this definition that cycles are the base for minimal return routes, not 
composite back-routes. Imagine the minimal return route to be the shortest possible 
route of a conventional message leaving the source router i via the interface A and 
returning after forwardings to the source router i. 



3.2 Avoiding Source Loops 

The existence of two distinct routes to the same destination - a composite back-route - 
is a necessary, but not sufficient criterion for a cycle. 

Therefore all composite back-routes with a source loop are shown in one table 
(fig. 5). The source loop corresponds to a path which passes through the source 

router i two times: it leaves router i via interface A, returns to interface A for the first 
time, leaves i again and finally enters i for a second time via interface B. 

A source loop can enter and leave the router via nine different network 
combinations: 

The corresponding path must enter router i again via interface A, interface B or any 
other interface S A A or B and must leave router i via interface A, interface B or any 
other interface S AA or B. 
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Fig. 5. Enumeration of all possible source loops within a composite back-route. 

Fig. 5 provides a complete listing of all source loops using a “rubber band” tightly 
connected to the interfaces A and B, entering and leaving the router using all possible 

interface combinations. In fig. 5, the route m‘^‘^ was chosen to contain the source 
loop. 

For all nine cases the path leaves router i via interface A first and finally enters 
router i via interface B at the end. Vertically, the interface where the path reenters 
router i for the first time is varied, i.e. interface A, B or any other interface other than 
A or B. Horizontally, the interface where the path leaves router i for the second time is 
varied, i.e. interface A, B or any other interface other than A or B. 

The source loops can be differentiated by the following features of their paths: 

- ESH (External Split Horizon): The path leaves router i for the first time and 
returns to it via the same interface. 

- ISH (Internal Split Horizon): The path enters router i for the first time and 
leaves it for the second time via the same interface. 

- Y combination: The path leaves router i for the second time and enters it in the 
end via the same interface. 

- X combination: The interfaces via which the path leaves router i and returns to 
it for the first time are different, and the same for the interfaces via which the 
path leaves and returns to router i for the second time. 

This differentiation has been applied to fig. 5. ESH and ISH can be avoided by 
applying the split horizon rule, which is proved later on. 
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Theorem 1 

If source loops are avoided by every router loeally, routing loops do not occur 

globally. 



Proof by contradiction: let the Path P‘ ‘‘ contain a routing loop. Routing loops contain 
at least one router r^ which is traversed twice: 

P’’‘‘ =(n =i,Si,...,r^,s^,...,ry = r^,...,r,,Si = d) 

From the viewpoint of router r,, the second traversal of r, in this path is a source loop. 

P"-"' = [r^,s^,...,ry =r^,Sy,...,ri,Si =d) 

With router r, avoiding this source loop, the source loop does not exist in its routing 
tables and is not propagated to other routers r The result is that path P'‘‘ cannot be a 
routing loop in which router r, is passed more than once. 



In order to show the interrelation between source loops (SL), routing loops (L) and 
counting to infinity (C) the following conditions hold: 

— i35L => — i3L 

^ — i35'L=>— i3C 

— i3L => — i3C 



The first condition is stated by theorem 1, which has been proved already. The second 
condition can be proved by the following observation: Counting to infinity behavior 
(C) can be defined as: “C is initiated, if reachability information of a destination 
subnetwork d runs repeatedly inside a cycle until the advertised distance d reaches the 
predefined maximum value.” The route to d contains a loop when it enters the cycle 
for the second time. Hence, if routing loops are avoided, there is no counting to 
infinity behavior. 



Impossible Composite Back Routes 

If the split horizon rule is used by the RIP algorithm it can be shown, that five out of 
nine source loops do not have to be considered, because they do not happen: the 
entries labeled by “ISH” (internal split horizon) and “ESH” (external split horizon) in 
fig. 5. 

Theorem 2 (Internal Split Horizon, ISH): 

If the split horizon rule is used, the source loops marked with “ISH” never occur 
because they are no composite back-routes m'f^ . 

Proof: Two route combinations contain a route with the corresponding 

path 

Pf‘‘ =(i,...,A,i,A,...,d) or Pf‘^ =(i,...,B,i,B,...,d) 

That means that the interface A (B) could serve as in and out interface in the path 
P . These composite back-routes are avoided by tbe source router i which applies 

the split horizon rule (“A router does not send outgoing routing updates back to the 
router from which it has learned this route”). In a path 

P'’‘‘ =(rj =i,Si,r2,S2,-.,ri,Si = d) , 

the condition s^ A holds for any router r^ (1 < x < / — 1) . 
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Theorem 3 (External Split Horizon, ESH): 

If the split horizon rule is used, the source loops marked with “ESH” never occur 
because they are no composite back-routes . 

Proof: Three composite back-routes mf' contain a route m'f with the corresponding 
path py‘ ={r, = i,s, = A,r, e NR(A),...,r_, e NR(A),s^_, = A,r = i,...,d)with 

NR(A)={ neighbor routers of interface A}. These composite back-routes are avoided 
by the neighbor routers applying the split horizon rule. They do not send those 
routes, which they have learned from router i, back to router i, because router i 
compared with the neighbor routers holds the shortest metric to destination d, and 
therefore router i always is the next router for destination d in the routing tables of the 
neighbor routers. 

Avoiding X Combinations 

Fig. 6 gives an example for a composite back-route with an X combination. In this 
case there is a composite back-route between interfaces A and B. 




Fig. 6. Example of a composite back-route with an X combination. 

From the viewpoint of router i the part of the AS including interface A and the part of 
the AS including interface B are separate. To get from the one part of the AS to the 
other, the source router i has to be traversed. It would not make any sense to use a 
route via interface A as an alternative route to destination d reachable via interface B. 
X combinations are constructed by concatenating two different cycles, in this case the 
upper and lower cycle. The X combination would leave the router via interface A, 
take a return route back to the source router i, enter the router via a interface different 
to A, leave the source router once again via another interface other than A or B and at 
last return via interface B. X combinations can be avoided by making sure that the 
metric of the route combination is less than the sum of the two minimal return routes 
via the first (A) and the second interface (E). 

Theorem 4: 

Let m‘jfg be a composite back-route. If minm\ -Vminm'g > m‘jff then the composite 

back-route m'^g does not contain an X combination. 

Proof: A composite back-route with an X combination has a path 
^a‘b‘ =(^^>•■•>■*1 ^A,i,S2 ^ B,...,d,..., B,i) 
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The first part of this path, from the first occurrence of i to the second occurrence of 
i, is a return route via interface A. The second part of the path, from the second 
occurrence of i to the third occurrence of i is a return route via interface B. Therefore, 
the shortest X combination is the sum of the minimal return routes via interface A and 
B ( minm\ + minnig ). 

If the composite back-route is shorter than the shortest possible X 

combination, then it cannot contain an X combination. 

Avoiding Y Combinations 

Fig. 7 shows an example of a composite back-route with a Y combination. The 
problem with Y combinations is as follows. Router i knows the route to destination d 
via interface B, because it is directly connected to i. If router rl learns from r2 that it 

r d 

can reach d via r2, rl will send an update to router i containing < m^’ = 3 > . 




Fig. T.Example of a composite back-route with a Y combination. 

Router i then has to decide whether d can be reached via interface A too. The route via 
interface B is correct, but the route via interface A contains a source loop, which is 
constructed as follows: the route leaves router i via interface A, returns to i it via a 
different interface and takes the first route via interface B directly to d. 

The minimal return route via interface A advertising the alternative route with the 
higher metric plays the crucial role here. Y combinations can be avoided by applying 
the following condition: 

Theorem 5: 

Let be a composite back-route combining the two route advertisements m‘^‘‘ 

and m‘g‘^ ; m‘g‘^ > m'jf . If minm\ > m'^ — then the composite back-route does 
not contain a Y combination. 

Proof: The longer route mij^f within m‘jfg belongs to the path 

={i,A,...,i,B,...,d). The first part of this path from the first to the second 
occurrence of i is a return route via interface A and the second part is the 





Avoiding Counting to Infinity in Distance Vector Routing 669 



corresponding path of m‘jf . Therefore is the sum of any return route via 

interface A and . 

If is smaller than the sum of the minimal return route via interface A and 
m;": 

i.d ^ • i , i,d 

< minm^ 

then does not contain a return route via A and is not a source loop. 

3.3 The RIP-MTI Algorithm 

In an acyclic AS it is not possible for any router to reach an arbitrary destination by 
two different routes over two different interfaces. If a router has a choice between 
different interfaces to a destination d, the AS must be cyclic, with the result that after 
the failure of a route to destination d it can possibly be replaced by an alternative 
route. But after such a failure the router must not accept any route to destination d as 
an alternative route: if the route is a member of a source loop this may lead to 
“counting to infinity”. 

With theorem 6 it is possible to decide whether two routes concatenated to build a 
composite back-route form a source loop or a cycle. 

Theorem 6 (Corollary): 

Let m‘jfg be a composite back-route and m'f > m‘g‘^ . All routers use the “split 
horizon ” rule. If 

a) minm\ +minm‘g > m‘jfg (to avoid X combinations) and 

b) minm‘^ > m‘f —m‘g‘^ (to avoid Y combinations). 

then m‘jff is a cycle cyc‘fg and therefore neither a source loop nor a routing loop 
and no “counting to infinity ” can arise. 

3.4 Calculating Minimal Return Routes 

Up to this point, the minimal return minm\ route via an interface A was assumed to 
be given. By looking at the construction of the X and Y combination composite back- 
routes in detail it can be seen that in most of the cases it is possible to use the 
composite back-route metric m‘jfg as upper bound of minm^ and minm^. Only in the 
case of a Y combination can the composite back-route metric not be used as an upper 
bound of minm^ where m‘jf > m'jf . 
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3.5 Overcoming Minimum-Hop Routing 

The problem with using a non-minimum-hop-metric is, that the formula for 
calculating composite back-routes = can be used for 

minimum hop metrics only. One way to overcome this problem is to extend the 
routing update message and the routing tables, with the hop count metric as additional 
information besides the actual metric. The RIP-MTI algorithm is executed with this 
hop count metric only calculating minimal cycles and minimal return routes using the 
previously described method. In this case, extending the routing protocol cannot be 
avoided. 



4 Simulating RIP-MTI 

4.1 Simulation Environment 

To simulate RIP-MTI a Java Applet|]was developed that allows the graphical building 
of models of networks, and enables switching between the original RIP and the new 
RIP-MTI algorithm [6]. During simulation connections between interfaces and 
networks can be cut, which makes it possible to provoke a “counting to infinity” 
situation. The temporal order in which the routers send their periodic routing updates 
is crucial for the emergence of counting to infinity. Hence the simulation program 
allows timing to be influenced. 

For analysis two different models were chosen (see fig. 8) representing the models 
in fig. 7 and fig. 6. 



4.2 Simulation Results 

Corresponding to the two simulated models shown in fig. 8 every possible temporal 
order was simulated with small and large periods between the routers leaving out 
those orders symmetrically equivalent to already simulated orders. 




Fig. 8. Simulating a Y and X combination 



^ http://www.uni-koblenz.de/~steignei7ripmti/ 
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There were 86 counting to infinity situations in total when using the original RIP 
algorithm (36 in model 1 and 50 in model 2). All of them were avoided when 
switching to the RIP-MTI algorithm resulting in much faster convergence. 

Fig. 9 shows how many routing updates were needed in the network in a counting 
to infinity situation to reach convergence using RIP-MTI compared to RIP with 
“infinity” set to 16, 31 and 61. The numbered labels on connectors depict two 
temporary orders in model 1 and four temporary orders in model 2. 



number of routing 
updates 













»5 



12543 13452 



Fig. 9.RIP-MTI comparative convergence behavior 

RIP-MTI accelerates convergence by 73 to 83% compared with the original RIP 
algorithm. When increasing infinity first to 31 and then to 61 improvements of 85- 
91% and 92-95% respectively where seen. This is because an increased value for 
“infinity” does not influence convergence in RIP-MTI. These factors can be 
transferred to the reduction of network traffic and to the duration until convergence, 
because they depend on the number of routing updates needed. 



5 Conclusions 

This paper showed that source loops are the cause of routing loops, which are 
responsible for counting to infinity. Therefore two conditions to avoid source loops 
were given and proven. These conditions were integrated into the extended RIP 
algorithm RIP-MTI, storing information about cycles between every pair of local 
interfaces of a router, which is a new and unique approach. Finally it has been shown 
by simulation that RIP-MTI does exactly what has been proved in theory: routing 
loops and counting to infinity behavior are eliminated resulting in much faster 
convergence with minimal demand for additional memory and processing power. 
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Abstract. At present, the global Internet consists of many ASes. Each AS pays 
a pre-determined connection fee to another AS for connecting its network with 
another AS’s network. The connection fee type charging may be rational in 
case of transferring the best-effort type traffic. However, usage charging is 
necessary to transferring the resource guaranteed type traffic such as the Intserv 
traffic and the Diffserv traffic. In this case, each AS pays a per-flow fee to 
another AS every time it routes a flow into another AS. The per-flow fees paid 
by each AS becomes a part of the cost for that AS. Thus, each AS needs to 
select a route with the lowest price for each inter-AS flow to improve its profit. 
In this paper, we call such an inter-AS routing scheme a price-based inter-AS 
routing scheme. When each AS has a request to route an inter-AS flow, it can 
select an inter-AS route with the lowest price to improve its profit by this 
routing scheme. Firstly, we propose a method to realize the price-based inter- 
AS routing scheme. Next, we propose cost-dependent pricing scheme suitable 
for the price-based inter-AS routing scheme. The cost-dependent pricing 
scheme can reduce frequency of exchanging price information between ASes. 
However, in the cost-dependent pricing scheme, profit in each AS depends on 
the distribution of path costs in that AS. Thus, we propose a routing policy for 
ASes with narrow ranges of path costs to improve their profits efficiently and 
verify its effect using a simple routing model. 



1 Introduction 

At present, the global Internet consists of many ASes. Here, AS (Autonomous 
System) means network provider with original network control policy to maximize 
his/her profit. Each AS pays a pre-determined connection fee to another AS for 
connecting its network with another AS’s network. The connection fee type charging 
may be rational in case of transferring the best-effort type traffic. However, it causes 
a serious problem in case of transferring the resource guaranteed type traffic such as 
the Intserv traffic [1] and the Diffserv traffic [2]. If the connection fee is constant, an 
AS may transfer vast traffic into another AS and network resources in the latter AS 
may be occupied by only the traffic transferred from the former AS. In this way, the 
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constant connection fee results in lack of network resources and unfairness between 
ASes [3]. 

Usage charging is necessary to solve the problem described above. Each AS pays a 
per-flow fee to another AS every time it routes a flow into another AS. Each AS can 
manage every inter-AS flow well because the number of inter-AS flows becomes 
relatively small by aggregating many flows between a source AS and a destination 
AS into a jumbo flow [4]. Accounting of packets conveyed on a flow may be 
necessary to realize the usage charging strictly. However, for example, contracted 
bandwidth and duration time of a flow can he substituted for number of packets in 
case of the expedited service and the assured service mentioned in the Diffserv 
specification [2]. Of course, price rate of flow should be varied according to the class 
of service that the flow belongs to. 

The per-flow fees paid by each AS becomes a part of the cost for that AS. Thus, 
each AS needs to select a route with the lowest price for each inter-AS flow to 
improve its profit. In this paper, we call such an inter-AS routing scheme a price- 
based inter-AS routing scheme. The price-hased inter-AS routing scheme can be 
realized by exchanging route price information between ASes. Eor example, price- 
vector information needs to be exchanged in addition to the path-vector information 
in the case of inter-AS routing protocol such as the BGP-4 (Border Gateway Protocol 
-4) [5]. 

In this paper, we propose a method to realize the price-based inter-AS routing 
scheme. When each AS has a request to route an inter-AS flow, it can select an inter- 
AS route with the lowest price to improve its profit by this routing scheme. Next, we 
propose cost-dependent pricing scheme suitable for the price-based inter-AS routing 
scheme. The cost-dependent pricing scheme can reduce frequency of exchanging 
price information between ASes. However, in the cost-dependent pricing scheme, 
profit in each AS depends on the distribution of path costs in that AS. We propose a 
routing policy for ASes with narrow ranges of path costs to improve their profits 
efficiently and verify its effect using a simple routing model. 



2 Price-Based Inter-AS Routing Scheme 

The price-based inter-AS routing scheme requests exchange of route price 
information between ASes. For example, price-vector information needs to be 
exchanged in addition to the path-vector information in the case of inter-AS routing 
protocol such as the BGP-4 [5]. Fig. 1 explains the price-based inter-AS routing 
scheme using the distance vector type routing protocol. Here, price of a path is 
defined as usage fee per unit of contracted bandwidth and per unit of duration time 
charged for flows which traverse that path. Of course, the price of a path depends on 
the class of service that the flow belongs to. 

The price of a path in each AS (Px) corresponds to sum of the path cost and the 
profit for that AS. It can be regarded as a distance of that AS (dx) in case of the 
distance vector type routing protocol. Thus, distance of each route (Rd) becomes total 
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F2=(P2+P3)*B*T 




Rd : Distance of inter-AS route dx : Distance of each AS 

Px : Path price in each AS Fx : Flow fee paid to each AS 

B : Bandwidth of flow T : Duration time of flow 



Fig. 1. Price-based inter-AS routing scheme 



sum of path prices in all the ASes composing that route. Each AS floods total 
distance, i.e., sum of its own distance and the route distance received from the 
neighboring downstream AS, to the neighboring upstream AS. In other words, each 
AS calculates sum of the path price in that AS and the route price received from the 
neighboring downstream AS. If this newly calculated value is less than total route 
price calculated previously, each AS floods this value to the neighboring upstream 
AS as new total route price. Each AS holds a price-vector table including information 
of each destination AS and the least total route price to that AS. Each AS can select 
an inter-AS route with the lowest price by this routing scheme when it has a request 
to route an inter-AS flow. By this way, each AS can reduce the cost caused by the 
per-flow fee that the AS has to pay the downstream AS, and can improve its own 
profit. 

Each AS needs to pay the per-flow fee corresponding to the total price charged by 
all the downstream ASes that convey the flow in sequence. The per-flow fee paid by 
each AS is distributed to all of its downstream ASes. Fig. 1 also shows a method for 
distributing the per-flow fee to the individual downstream ASes. In this method, each 
AS claims the per-flow fee (Fx) corresponding to the total price charged by itself and 
all of its downstream ASes to the neighboring upstream AS. Next, each AS pays the 
per-flow fee claimed by the neighboring downstream AS to that AS. In other words, 
each AS obtains a part of the per-flow fee received from the neighboring upstream 
AS, and pays the remained part of the per-flow fee to the neighboring downstream 
AS. The per-flow fee which each AS should claim to the neighboring upstream AS 
can be calculated easily using the price-vector table that each AS holds. Of course, 
the per-flow fee is accummulated e.g. during a month, and those claim and payment 
of the per-flow fee are performed once per month between two ASes. 
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Path price 




Fig. 2. Cost-dependent pricing in two ASes 



3 Pricing Scheme in Each AS 

State-dependent pricing can be considered as a pricing scheme for each path in the 
AS [6] [7]. In the state-dependent pricing scheme, price generally increases as path 
utilization increases. By this way, congestion can be avoided and users are given 
incentives to select a route with the highest QoS. However, the price- vector 
information needs to be exchanged between ASes frequently because the price of a 
path varies according to the change of its utilization. 

Here, we propose cost-dependent pricing suitable for the price-based inter-AS 
routing scheme. In the cost-dependent pricing scheme, price is slightly reduced when 
path cost is small. As is shown in Fig. 2, this pricing scheme gives users incentives to 
select a route with low cost and thus ASes can obtain larger profit. Here, we consider 
profit obtained from a flow per unit of bandwidth and during a unit of time. 
Therefore, the profit can be defined as difference between the path price and the path 
cost. If the route cost reflects static QoS like the number of hops along the route, the 
cost-dependent pricing may give users incentives to select a route with higher static 
QoS. 

The path cost is generally constant. Therefore, the price of a path is also constant 
in the cost-dependent pricing. This means that the cost-dependent pricing is more 
suitable for the price-based inter-AS routing scheme because it can reduce frequency 
of price information exchanges between ASes. Hereafter, we investigate the price- 
based inter-AS routing scheme in the case where each AS adopts the cost-dependent 
pricing scheme. 
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4 Routing Policy for ASes 

In the cost-dependent pricing scheme, profit that each AS can obtain may be 
restricted by distribution of path costs in that AS. For example, let us consider an 
inter- AS route traversing two ASes with different ranges of path costs, as is shown in 
Fig. 2. The average path costs of AS 1 and AS2 are identical, and profit levels in both 
ASes are also identical. Though ASl and AS2 have the same price level, AS2 has a 
wider range of path costs and thus a wider range of path prices compared with ASl. It 
is assumed that the path costs in both ASes follow the uniform distribution within 
their ranges. 

In the case of Fig. 2, AS2 has larger influence on the total price of the inter-AS 
route. In other words, the inter-AS route is usually selected because of low path price 
in AS2. This means that ASl cannot obtain large profit compared with AS2. On the 
other hand, AS 1 cannot expand the price range unlimitedly because at least the price 
range must be smaller than the cost range. 

We propose a routing policy for ASes with narrow ranges of path costs to improve 
their profits efficiently. In this routing policy, each AS prohibits flows from which 
that AS cannot obtain profit more than “shadow price” [8]. Here, the shadow price 
means profit that the AS can expect to obtain using unutilized resource resulting from 
rejection of the flow. In other words, each AS permits only flows by which that AS 
can increase its long-term profit. 

Profit from a flow is generally constant in each AS because path cost and thus path 
price for that flow is invariable in each AS. On the other hand, the shadow price in 
each AS depends on only flow arrival rate and utilization rate in the path that the flow 
traverses. This means that flows traversing a path are prohibited when utilization of 
that path is larger than a threshold given in advance. In other words, each AS floods 
an prohibition signal to other ASes only when the path utilization exceeds the 
threshold, and floods a permission signal only when the path utilization falls below 
the threshold. This threshold corresponds to the path utilization rate where the profit 
obtained from the path and the shadow price of the path are identical with each other. 

Because each AS floods signals only when the path utilization crosses the 
threshold, this routing policy does not increase frequency of information exchanges 
between ASes compared to the state-dependent pricing scheme. Prohibiting flows 
from traversing a path corresponds to regarding the price of that path as infinite. For 
this reason, the proposed routing policy can be realized within the framework of 
price-vector type inter-AS routing protocol. 

Fig. 3 shows an example of the proposed routing policy. In Fig. 3 (a), the ingress 
gateway is the bottleneck resource for each path, and the utilization of the ingress 
gateway determines the shadow prices of all the paths. If flow arrival rates of all the 
paths are identical, the shadow price of each path also becomes identical. Fig. 3 (b) 
shows the relationship between the shadow price and the profit obtained from each 
path. When the gateway utilization exceeds threshold for a path, flows are prohibited 
from traversing that path because that path gives only profit less than the shadow 
price. 
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(a) Example of internal structure of an AS 



Profit from each path 




(b) Example of relationship between profit from 
each path and shadow price 

Fig. 3. An example of routing policy 



5 Performance Evaluation of Routing Policy 

5.1 Performance Evaluation Model 

Here, it is assumed that the flow arrival is random and the flow duration time follows 
negative exponential distribution. Moreover, each inter-AS route that a flow traverses 
is assumed to be unchanged during the existence of that flow. This means that packets 
of a flow are conveyed on a fixed connection such as LSP (Label Switched Path) [9]. 
From those assumptions, the value of shadow price and economical efficiency of the 
proposed routing policy can be analyzed using the policy iteration method derived 
from the Markov decision theory [10] [11]. In the policy iteration method, the policy 
is updated repeatedly until it converges on the optimum policy. 
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1 2 k 




However, we have to update policies for more than one AS in the inter- AS routing 
model. There may exist no Nash equilibrium point in the inter- AS routing model. 
Moreover, the policy iteration method does not assure convergence on a Nash 
equilibrium point even if a Nash equilibrium point exists. In this paper, the routing 
policies for ASes are updated alternately and the update of policies continues until the 
total profit of all the ASes becomes maximum. In other words, alternate policy update 
is finished if the total profit of all the ASes decreases by updating the policy. 

In this paper, only several AS-disjoint inter-AS routes between a pair of source AS 
and destination AS are considered as is shown in Fig. 4. Those routes are assumed to 
traverse the same types of ASes each other and have an identical bandwidth. 
Bandwidth utilizations in those routes are assumed to be independent of each other. 
Thus, the amount of calculation can be reduced. In Fig. 4, there exist m disjoint routes 
with n units of bandwidth between the source AS and the destination AS. Each route 
consists of k ASes. It is assumed that every flow requests a unit of bandwidth. Here, 
we denote the applied traffic intensity per unit of bandwidth by A. 

We consider five types of ASes with the same average path cost but with different 
ranges of path costs. Those ASes are assumed to have an identical profit level and an 
identical price level as a result of economic competition between ASes. Neverthless, 
they have different ranges of path prices because they have different ranges of path 
costs. Table 1 shows the path cost range and the path price range in each AS type. 

The path cost in each AS is given randomly and uniformly within each path cost 
range when a new inter-AS flow arrives. In other words, each AS has virtually 
infinite number of paths. All the paths in each AS are assumed to share a bottleneck 
resource and utilization of this bottleneck resource corresponds to the path utilization 
in all the paths. The path utilization in the ASes composing an inter-AS route is 
assumed to be identical and this path utilization corresponds to the bandwidth 
utilization in that route. 
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Table 1. Path cost and path price in each AS 



AS type 


Path cost 


Path 


price 


Minimum 


Maximum 


Minimum 


Maximum 


1 


4.0 


5.0 


5.0 


5.0 


2 


3.0 


6.0 


4.0 


6.0 


3 


2.0 


7.0 


3.0 


7.0 


4 


1.0 


8.0 


2.0 


8.0 


5 


0.0 


9.0 


1.0 


9.0 



Table 2. Composition of routes utilized for each experiment 



Experiment 


AS types 


I 


1 (+ 1 (+ 1 ) ) + 2 


II 


1+1+2+2.1+1+2+3, 

1+1+2+4.1+1+2+5 



Table 2 shows composition of inter-AS routes utilized in the following 
experiments I and II. The experiment I evaluates the effect of routing policy when the 
number of inter-AS routes and the number of ASes composing an inter-AS route 
vary. The experiment II evaluates the effect of routing policy when various types of 
ASes compose each inter-AS route. 



5.2 Results of Experiment I 

Fig. 5 shows relationship between the profit per unit of bandwidth in each AS and the 
number of routes m. In Fig. 5, each route consists of one type- 1 AS and one type-2 
AS {k = 2), and have 200 units of bandwidth (n = 200). The profit in the type-1 AS is 
smaller than that in the type-2 AS because the path price range in the type-1 AS is 
narrower than that in the type-2 AS. However, the type-1 AS can improve its profit 
by adopting the routing policy proposed in this paper. Fig. 5 shows the profit in each 
AS when the sum of profit in two types of ASes becomes maximum by the alternate 
update of the routing policies for two ASes. 

If the rates of price change to cost change are identical in two ASes, a route with 
the least cost can be selected by the price-based inter-AS routing scheme. For the 
purpose of comparing the effect of price adjustment with that of the routing policy. 
Fig. 5 also shows the profit in each AS when a route with the least cost is selected by 
adjusting the path prices in the type-1 AS. 
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(b) A = 0.8 
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Profit per unit of bandwidth 




Number of routes (m) 

(c) A = 0.9 

Fig. 5. Profit versus number of routes 

Execept for the type-1 AS in the “No policy” and the “Routing policy”, the profit 
in each AS becomes larger as the number of routes increases. This result is caused by 
the cost-dependent pricing scheme. In other words, an inter-AS route with less price 
can be selected as the number of routes increases, and this inter-AS route with less 
price can give more profit to each AS in the cost-dependent pricing scheme. 

By adopting the routing policy, the type-1 AS can improve its profit drastically 
while the profit in the type-2 AS is slightly reduced. This improvement in the profit 
becomes larger as the value of A becomes larger, compared with the improvement 
caused by the price adjustment. This is because the shadow price becomes larger by 
the increase of the applied traffic intensity. At this time, only flows giving larger 
profit are permitted by the routing policy in the type-1 AS. The probability that all the 
routes prohibit a flow is less than four percent at every case in Fig. 5. 

Fig. 6 shows effect of the routing policy when the number of type-1 ASes involved 
in each route varies. In Fig. 6, the number of routes is fixed at 10 and each route has 
200 units of bandwidth. Fig. 6 shows the profit in each type of AS when the total sum 
of profit in all the ASes becomes maximum by adopting the routing policy. As the 
number of type-1 ASes increases, effect of the routing policy is reduced because 
many type-1 ASes must share the effect. In other words, the improvement of profit in 
the type-1 AS decreases and the reduction of profit in the type-2 AS increases as the 
number of type-1 ASes increases. 
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Fig. 6. Profit versus number of type-1 ASes 



• No policy A = 0.9 

O Routing policy A = 0.8 

A = 0.7 

Type-2 AS 




Type-1 AS m = 10 

n = 200 



1 2 3 

Number of typ-1 ASes (k-l) 



5.3 Results of Experiment II 

Fig. 7 shows effect of the routing policy when the composition of routes varies. Each 
route is composed of two type-1 ASes, one type-2 AS, and one type-x (x = 2 ~ 5) AS. 
The number of routes is 10 and each route has 200 units of bandwidth. Fig. 7 shows 
the profit in each type of AS when the sum of profit excluding the type-x AS 
becomes maximum by adopting the routing policy. 

As is shown in Fig. 7, effect of the routing policy is reduced as the path price range 
in the type-x AS expands. This is because the path price in the type-x AS becomes 
too dominant on the whole route price when the path price range in the type-x AS is 
wide. At this time, the routing policy cannot compensate for the profit in the type-1 
AS sufficiently. Since the type-2 AS has the middle price range, the profit in the type- 
2 AS also decreases by adopting the routing policy. However, this decrease is smaller 
than that in the type-x AS. 

AS a conclusion, the ASes with narrow ranges of path costs can improve their 
profits efficiently by adopting the routing policy, so long as no AS with extreme 
dominance on the whole route price exists. 





An Inter-AS Policy Routing and a Flow Pricing Scheme to Improve ASes’ Profits 685 



Profit per unit of bandwidth 
1.0 



0.9 



0.8 



0.7 



0.6 



0.5 



0.4 



0.3 



Fig. 7. Profit versus composition of routes 
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6 Conclusions 

We proposed a method to realize a price-based inter-AS routing scheme, where price 
for a path in each AS is regarded as a distance of that AS. When each AS has a 
request to route an inter-AS flow, it can select an inter-AS route with the lowest price 
to improve its profit by this routing scheme. Next, we proposed cost-dependent 
pricing scheme suitable for the price-based inter-AS routing scheme. The cost- 
dependent pricing scheme can reduce frequency of exchanging price information 
between ASes. In the cost-dependent pricing scheme, profit in each AS depends on 
the distribution of path costs in that AS. We proposed a routing policy for ASes with 
narrow ranges of path costs to improve their profits efficiently and verified its effect 
using a simple routing model. The ASes with narrow ranges of path costs can 
improve their profits efficiently by adopting this routing policy, so long as no AS 
with extreme dominance on the whole route price exists. 

The price-based inter-AS routing scheme needs to be evaluated using more 
practical routing model. Game theoretical analysis on the price-based inter-AS 
routing scheme is also necessary. Those items are left for further studies. 
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Abstract. In this paper we describe how stigmergic techniques can be 
used in packet networks that offer soft QoS services. The problem we are 
interested is the on-line version of computing routes to be established 
over a packet network, and the number of constraints imposed by the 
service is more that one. We investigate the scheme of the algorithm, 
the issues around the characteristics of the constraints and we give some 
simulation evidence of the working algorithm. 



1 Introduction 

In current IP networks the offered service is best-effort as there are no guaranteed 
bounds on the end-to-end delivery of packets (in terms of delay, jitter, economic 
cost of any other metric). One of the challenges of the evoluting IP technology 
is to offer such guarantees with the minimum cost for the network. Apart from 
the forwarding function that needs to be enhanced inside the router engines, 
the routing algorithms that compute the paths the traffic packets follow end- 
to-end has to take into account the type of constraints those packets require. 
Such ToS(Type of Service) or DSCP(Differentiated Services Code Point) aware 
algorithms would give benefits in terms of resource utilization at the core of the 
network. 

The version of the problem we are interested is a variance of QoS-aware rout- 
ing. The first main assumption is that the algorithm does not use any explicit 
knowledge of reservation of resources across network nodes. Such algorithms 
could operate in both a network that performs plain best-effort hop-by-hop for- 
warding, and any other that uses Differentiated Services forwarding mechanisms. 
The second assumption concern the form of constraints and the expectations of 
the applications we consider across the QoS spectrum. At one edge of the spec- 
trum there are application that can operate well with best-effort service (e.g. 
e-mail and FTP). At the other edge there are application that they can benefit 
out of resource reservations that would guarantee them their stringent margins 
of expected network performance (e.g. telephone calls, video-conferencing). In 
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between there are applications that have less marginal expectations from end- 
to-end performance. Such applications would benefit out of any possible opti- 
mization of their considered metrics but they can also adapt. We term the area 
in the middle of the QoS spectrum, soft QoS. 

In out case we consider a system that supports a single soft-QoS service 
(although the proposed algorithm can easily be extended to support more than 
a single service). We will consider two constraints. The first considered constraint 
is the average delay for packets. The second constraint is the available capacity 
of the network along an end-to-end connection. The routes are determined in 
a hop-by-hop manner. The problem of QoS-aware routing has been proven to 
be NP-complete in |^. A number of algorithms that use heuristics have been 
proposed to solve relaxed version of the problem In this paper we 

examine a swarm intelligence approach into solving the problem. 



2 Swarm Intelligence in Networks 

In P PI 0 there have been presented distributed algorithms for computing 
routes in cases of traditional telephone networks and best effort (symmetric or 
asymmetric) IP networks based on the biological paradigm of ants foraging be- 
haviour. The technique used by ant colonies to locate and transfer food supplies 
into their nest, or even construct complex structures has been termed stigmergy 
P and it has been a source of inspiration for computer scientist investigating 
discrete optimization problems like TSP p. 

The proper definition of stigmergy by Grasse is the following : ” Stimulation 
of workers by the performance they have achieved” So basically stigmergy is 
a positive feedback mechanism using chemical substances like pheromones to 
attract agents, that themselves depose these chemicals. That loop reinforces 
solutions selected by the majority of the biological agents and even with random 
initial conditions (that is with agents selecting among viable paths with equal 
probability) optimal solutions (shortest paths) can be found for transporting 
food back to the nest. This capacity of estimating shortest paths has been at 
the core of the algorithm suggested by various computer scientists. 

The agents instantiate themselves in the virtual space of routes, exploring 
various possibilities, until the system finds one that cannot be reinforced any 
more. A system that uses solely positive feedback could lock to poor solutions 
rather than the global optima. There is a need for a negative feedback mecha- 
nism that would stop this unwilling “freeze” of the system. In the case of real 
pheromones this role is performed by the physical evaporation. In the case of 
the routing algorithms different algorithms have used different techniques. All 
of the algorithms share the use of virtual “chemicals”, “pheromones” to act as 
intermediates for stigmergy. These pheromones are usually treated as probabil- 
ities. 

The “swarm” of agents uses a number of feedback mechanisms to influence 
the final paths selected for forwarding packets. 
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Fig. 1. Implicit negative feedback mechanism in operation among ant-like agents in a 
decision point. The volume of the pheromone along a path is affected by the number of 
agents that traverse it, that depends on the pheromone along the path and the metric 
value of a path. The shortest path is traversed on average by more agents, double the 
number of agents that follow the alternative path in this simple example where the 
length of the shortest path is half the length of the alternative. 



Positive Feedback This mechanism is used to reinforce the use of one of the 
alternate paths. Upon the arrival of a Bant to a network node the Pm metric 
value measured by the corresponding Fant is used to produce a raw value 
r'. The r' value is used to increase the probability of the interface the Fant 
used to get to that node. 

Negative Feedback This mechanism is responsible for decreasing the proba- 
bility of an alternate path to be selected. There are two negative feedback 
mechanisms during the operation of a single swarm. The first is due to the 
use of the pheromone vectors as probability distribution. That means the 
pheromones’ sum as in EqlD should always be equal to 1. The increase of a 
pheromone by the positive feedback, necessitates the decrease of all the oth- 
ers. The second negative feedback mechanism is implicit. The delay of Fant 
to a node accordingly to the metric value of the sub-path it has experienced 
gives the advantage to the agent that have followed the best available sub- 
path up to that node. The corresponding Bant has the chance to increase 
the probability of its followed path to be selected, before the ones of the Fant 
that have followed an alternative route. The net effect is the reduction of the 
volume of ants following alternative routes (see FigQ. 

The proposed algorithms tackle the problem of on-line routing for networks 
that support a single type of service that has a single constraint. In the telephone 
network the service is the establishment of a call and the constraint is that the 
selected routes should minimize the probability of a call being rejected for the 
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system. In the packet network case the service is the delivery of packets from a 
source to a destination in the “best-effort” sense. The constraint is that the routes 
selected should minimize the average delay the packets experience traversing the 
network. 

3 Multi Constraint Algorithm 

3.1 General Scheme 

The scheme we describe in this section is not based on the net effect of a sin- 
gle virtual “Swarm” (termed also Ant Colony or the set of agents) but to the 
combined effect of multiple Swarms. For that reason we term our algorithm as 
Multi-Swarm. The main idea is to associate with each unique constraint-metric 
pair m a unique Swarm Swarrum that has its own defined behaviour and with 
each service a type of pheromone. The Swarms that are “stimulated” by con- 
straints that are included in a service definition change the unique pheromone 
vectors that corresond to that service (see FigED. 




Swarm 2 



Fig. 2. Principle of operation for the Mnlti-Swarm Routing Algorithm, Both Swarms 
affect the same pheromone, which is an expression of the probability of a path selected 
for the service. 

Each swarm uses the idea of ’’virtual pheromones”. In traditional routing 
tables for every destination a single entry is kept that indicates which of the 
possible nodes’ interfaces should be used to forward the packet. The routing 
tables that are used in our algorithm hold for every destination node a vector 
that has the size of the number of node’s interfaces. The values hold there, are 
restricted in the [0,1] interval and there are termed pheromones, due to the way 
there are updated by algorithms agents. The packets of the routing protocol 
are forwarded probabilistically. These virtual pheromones can be treated for 
forwarding user packets in two ways. 

In stochastic forwarding the vector is treated as a probability distribution 
over the interfaces for sending packets towards a particular destination. That 



Solving Multi-constraint Routing for Packet Networks 



691 



means that the interface that is selected for each packet to be forwarded towards 
a specific destination follows that distribution. The deterministic way uses a 
number of rules to determine a number of interfaces to be used. In the single 
path version of the problem a single interface should be eliminated (e.g. the 
one with maximum pheromone tables). In the multi-path version a number of 
interfaces are used by utilizing hash functions to determine which subsets of 
packets moving towards a destination will use one of these interfaces. 

There are two sets of virtual agents (termed also “swarms of ants”) that 
are responsible for the update of the pheromone tables. Each swarm takes de- 
cisions based on the single constraint that is associated with and updates the 
pheromones’ routing table. There are two forms of implicit communication (via 
the pheromone values). The agents of each swarm affect fellow agents that will 
traverse the node in future time. They also affect agents of other swarms that 
will traverse the link. 

For individual swarms we follow the ideas presented in 0 in order to deal 
with the asymmetric nature of traffic and both the statistic and non-stationary 
variations of the metrics used. The mechanism is extended to deal with non- 
additive metrics. 



3.2 Additive Metric 

The mechanism described in the following paragraphs takes as an example the 
delay metric, but it can be applied for any additive metric. Every node with 
id k holds a list Tripk{ni,(7i) of estimates of the arithmetic mean pi and the 
associated variances ai for trip times from that node to all others in the network. 
This data structure is the view a node has for the delay reaching any other node. 
The Tripk values are used when the nodes update the pheromone tables along 
with measurements from the agents. 

To update its data structures in an asymmetrical network every node has two 
types of agents. The forward ant agent is spawned in regular intervals from 
all possible nodes s towards destinations nodes d, which are selected randomly. 
Every forward ant has a memory stack. The forward ants are forwarded towards 
their destination using the pheromone tables and they use the same queues as 
all other packets. Whenever a node receives a forward ant, it checks whether it 
has arrived in this node before and if it had (cycle detection) it removes all the 
entries of the cycle from ant’s stack. Then it pushes the arrival time into the 
agent’s memory stack. 

When a forward ant reaches its destination it spawns a backward ant agent 
Bg^d at which it transfers its memory and terminates. The backward ant follows 
the reverse path the stack indicates and use its values to update the Tripk{pLi, ai) 
data structure and the pheromone table of every intermediate node. 

When it arrives at the source of the initial forward ant it terminates. The 
update rules for the pheromone table have to conserve the sum of the every 
vector goodness values to 1 (see Equation-P). 
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= l,i G [1, N], Nk = neighbor s{k) (1) 

neNk 

We follow the update rules from estimating a new parameter r’ whenever 
a backward ant reaches a node, coming from a node /. This r’ value is used to 
update the goodness values of the vector that is associated with the source node 
of the backward ant. The entry of the vector that corresponds to the interface 
that the associated forward ant used in its journey is increased (rule r+ in 
Equation-]^ and the other are decreased (rule r- in Equation-0 where P^f and 
Pdn are the last probability values assigned to neighbors of node k for destination 
d. 



r+ = (1 - r') * (1 - Pdf) 
r_ = (1 - /) * Pdn, n€ Nk,n^ f 



( 2 ) 

(3) 



The estimation of r’ involves a number of steps. Initially a raw estimation 
of r’ is computed using Equation-0 In Equation^ T is the observed trip time 
of the ant, is its mean value as stored in list Tripk{g,i, ai). This rule saturates 
out of range values to 1. The second step depends on the comparison oi a/ g, 
with a arbitrary small threshold e. If a/p is less that the threshold value, the 
algorithm considers that the observations on the mean are stable. In this case 
the previously computed value of r’ is decreased or increased by the quantity 
S{a, p, a) (as in PJ). 



r 



/ 



T_ 



c>l if 



T_ 

Cfl 



< 1 



1 otherwise 



(4) 



At the other case the quantity is increased or decreased by the quantity 
U{a,p,a') (as in 0) where a' < a. Finally the value obtained at the second 
step is filtered through a power law and bounded in the interval of [0,1]. The 
reasoning for the second step is that the algorithm should discriminate the case 
where the traffic fluctuations follow the same statistics and the case that this 
statistics change. At the first case poor r’ values are increased and good r’ 
values are decreased. At the second case, where the algorithm cannot consider r’ 
a reliable measure of goodness, it tries to find a solution by amplifying the good 
r’ values and suppressing the poor ones. For more observations on the behaviour 
of Ant-like algorithms and the semantics of the parameter values look at 



3.3 Concave Metric 

The other constraint that we are interested in, is the available bandwidth. This 
constraint is not additive. The metric of a path is the minimum metric of the links 
it is comprised of and not the addition of them. The non-additive nature of the 
metric necessitates another mechanism for updating the pheromones. Another 
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Fig. 3. Topologies Used for Simulation Scenarios 



difference of this metric from delay, is that it cannot be measured by the agent 
itself. In the case of the delay metric the agent could register its departure and 
arrival time and estimate the effective delay it experienced through the link. This 
is not possible for available bandwidth. A resource monitor should be present at 
either side of the link to report the link available bandwidth. Another side-effect 
is that the arrival rate of the agents belonging to the swarm corresponding to 
available bandwidth cannot be used the same way as in the mechanism for the 
delay metric. 

1. The fact that the metric is concave reduces the necessary stack space to a 
single cell that would hold the minimum value of available bandwidth, S 
experienced by the agent over its path. So to cope with the concave nature 
of the metric we reduce the number of values stored over the Forward Ant 
of the Swarm adventure in the network. 

2. We introduce a resource monitor that the agent communicates with, at its 
arrival to a node. The resource monitor holds both average and deviation 
values for the utilization of the links of all incoming interfaces. 

3. Since using the agents arrival rate the same way as for the delay swarm 
is not indicative of how many available bandwidth they have encountered, 
we introduce an artificial delay in its node to restore the validity of that 
information. The Forward agents are delayed in a similar way to the ABC 
algorithm in [Q . The proposed delay for an agent traversing a link with spare 
capacity S is given in Equation-^ The value of r’ is computed by the formula 
in Equation-|^ where C is the capacity of the link. 



D = c*e~‘^-^ (5) 

+ 0 (0 

In order to test the operation of our scheme we have simulated a single- 
constraint version of the algorithm over different topologies with uniform load. 
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4 The Simulation Scenario 

In our simulation we use three simple network topologies to estimate how dif- 
ferent is the behaviour of the Ant-like algorithm compared to a Link State algo- 
rithm at steady state. In our case we used as an example of Link State algorithm, 
LS that uses Dijkstra algorithm internally to compute routes. For comparison 
reasons we also use a static routing algorithm called Session routing. 

The topologies considered, is a Tree, a Cycle and a General graph. The Tree 
is a topology that offers no alternative routes for any node in the network. The 
Cycle topology is the other extreme that offers the same number of alterna- 
tive route for every node as interfaces. The example of General Graph is an 
intermediate situation where the nodes have connectivity between 2 and 3. The 
topologies are as in Figure-0 

The LS and the Session algorithm have no parameters associated with them. 
The Ant-like algorithm has a number of parameters associated with the discov- 
ering properties like the parameters c,a, o’, e, h(the exponential of the power 
law). In our simulations we kept the same parameters values as in |2j. In our 
scenario the traffic is generated by a number of FTP sessions established be- 
tween every possible source destination source. We vary the number of active 
sessions between source and destination pairs according to a uniform distribution 
with minimum parameter o 0 and a varying maximum parameter b. Because we 
are interested in the steady state performance of the algorithm all connections 
start simultaneously after lOOsec from the beginning of the simulation. Every 
simulation lasts for lOOsec. 

5 Statistical Methodology and Measurements 

In order to validate our results statistically we repeated five (5) times every 
experiment with a different seed. We selected their values based on the internal 
seed values that the ns-2 Simulator is using if the user selects a heuristic way to 
set the values. The distances between the random seeds are close to one million. 
We are interested at the total packets that have been successfully forwarded 
under a routing scheme. To measure that for every simulation we sample the 
number of successfully acknowledged packets from the TGP layer every 0.05 
seconds during the lOOsec long period the sessions are active. 

For steady state behaviour we are interested in averaging out the behaviour 
of TGP sessions due to the initial condition and the Slow-start phase. To succeed 
that we also sample the current window size of the TGP layer every 0.05 seconds. 
In our simulations we use TGP Tahoe. In this version TGP increases the window 
size exponentially during its Slow-start phase and linearly at the congestion 
avoidance phase. We are interesting in identifying the end of the first Slow-start 
phase that is due to the beginning of the simulation. We use the differential 
of the window size for that purpose. We chop our statistical sample until the 
differential is stable. 

To compare the behaviour of the algorithms we estimate the total through- 
put of each of our experiments for each of them. We estimate the average of the 
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total throughput over the different samples. We also use the advanced Anderson- 
Darling(A2) test rather than the commonly used Kolmogorov-Smirnov test of 
normality (see m for reasoning) to check whether the total throughput mea- 
surements fir the normal curve. We also estimate the 95% confidence interval for 
every scenario as we increase the average load. 




T*a>ta L iMaiMiyt* •umh»i •( «•••«• pat <« .«> 



Fig. 4. Comparison in terms of Total Throughput for the Tree Topology 



6 Results 

All our Anderson-Darling tests of normality showed a reasonably good fit of 
our data to the normal curve distribution. That means that we can use the z 
.025 value for estimating the 95% confidence interval for our data. The measure- 
ments are summarized in Figure-0 Figure0 Figure-El At the y-axis we show 
the achieved throughput in Mbits/sec for a specific scenario. At the x-axis we 
show the maximum number of active sessions for each source-destination pair in 
the network. The y-axis error bars show the 95% confidence intervals of average 
total throughput over the 5 samples we have for every scenario). The results for 
the other topologies follow the same trends. 

We can see that the routing schemes have similar achieved throughput mea- 
surements. The behaviour of the averaged total throughput of the network is 
always between the 95% confidence interval zone. 

7 Summary 

We have presented the Multi-Swarm algorithm , an Ant Colony inspired algo- 
rithm to compute routes based on more than a single constraint. The algorithm 
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Fig. 5. Comparison in terms of Total Throughput for the Cyele Topology 




Fig. 6. Comparison in terms of Total Throughput for the Graph Topology 



also can be used as a protocol that collects network statistics using for that pur- 
pose the capabilities of the agents. Although no capabilities of active networking 
or agent platform are necessary for such a scheme, it can physically be integrated 
to such architectures of network management. We have tested its behaviour in 
steady state under uniform TCP traffic, in terms of the achieved throughput. 
The results show similar performance to that of Link-State algorithm, that can- 
not scale to more that two independent constraints (as in jS|, |Z1, 0|) and suffer 
when link-state update frequencies is large. 
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Abstract. A virtual private network (VPN) functionality should in- 
clude a performance guarantee provided to the customers. To provide 
guaranteed services, the network provider allocates appropriate capac- 
ities to multiple virtual backbone networks such that the underlying 
network can be shared among them. As VPN users are demanding reli- 
able and dynamic allocation of capacities, recently the capacity resizing 
approach has been considered as a cost efficient way of providing virtual 
network services. We propose a new scheme for dynamic allocation of vir- 
tual link capacities. The allocated capacities are adjusted dynamically 
according to the users’ requests such that their capacities are increased in 
a fair manner and the total reservation does not overwhelm the underly- 
ing network. Depending on the network’s status and allocation policy, a 
virtual link may increase or decrease its capacity, for example, for a mon- 
etary incentive. VPN users send request packets whenever they want to 
resize their capacities, and the network handles them in an efficient and 
fair way. The simulation and analytic results shows that our scheme is 
simple and robust such that the link capacities are allocated in a efficient 
and fair manner. 



1 Introduction 

A virtual private network (VPN) service is likely to be used by customers as a 
replacement for networks constructed using private lines thus its function- 

ality should include the performance guarantee provided to those customers. As 
many of VPN customers’ applications will not allow the delay and packet loss in 
the public Internet, their virtual backbone links require some performance assur- 
ances. Such virtual links can be realized by packet scheduling algorithms, label 
switched paths, or other layer 2 tunneling mechanisms. Most of the previous 
VPN mechanisms, however, paid much less attention to resource management 
issues. In order to achieve the statistical multiplexing in the provider’s network, 
and thus increase the utility of the underlying network and give the VPN users 
some economical benefits, the capacity of virtual links should be dynamically ad- 
justed. We expect that it is difficult for the VPN customers to specify their QoS 
requirements exactly, and they might want to change the capacities of the vir- 
tual backbones occasionally, or in some case frequently. A couple of approaches 
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on the dynamic resource management in VPN can be found in literature: In j2j, 
a new service interface is introduced to provide the dependable and dynamic 
connectivity between end points, and has proposed a similar approach which 
provides dynamic capacity allocations to different classes. 

In this paper, we develop a new capacity resizing mechanism for dynamically 
adjusting the virtual links. We are not concerned with the problems of how to 
provide the bandwidth guarantees or how to match users’ requirements and ser- 
vice levels. Rather we are interested in how to resize the virtual link capacities 
in a fair and efficient manner. In our scheme, VPN customers can request to 
resize their virtual links by sending control packets to the network manager, and 
according to the network status and the allocation policy, they are allowed to 
resize their virtual backbones. Thus after establishing a virtual link, the capac- 
ity is not permanent and can be dynamically updated according to the traffic 
demand of the VPN customer. The network provider handles the requests such 
that the total reservation does not overwhelm the underlying network and the 
residual capacity, if available, is fairly distributed among competing customers. 

This paper is organized as follows. The model is formulated in Section 0and 
its property is examined and simulated in Section 0 In Section 0 we discuss 
the extension to the case of multi-hop virtual links, and Section concludes the 
paper. 

2 The Model 

We first describe the framework for the single link case. A set K. = {1, ... ,K} 
of virtual links(VL) is given and share the link of capacity C. Initially capacity 
Ai is allocated to virtual link i such that Ai < C. A virtual link(VL) 

can request to increase or decrease its reservation, and the capacity resizing 
server(CRS), the link management entity, adjusts the reservation to accord- 
ing to the admission rule described below. Presumably the VPN user will have 
some monetary incentive to voluntarily decrease its reservation for the VL. We 
note that, while it may represent the customer-pipe of a VPN, each endpoint of 
a VL may be characterized by the /iose|2j, which specifies the capacity required 
for aggregate incoming/outgoing traffic at the endpoint. 

VL(Virtual link) behavior 

A VPN user who owns a VL generates a special packet that contains a desired 
capacity Si that the user wants to reserve and the current reservation r^. Af- 
ter returning back the packet from the capacity resizing server(CRS), it can 
increase (or decrease) its reservation to Si which is assigned by the CRS. After 
setting Ti = Si, if Vi < Ai, that is, current reservation is less than or equal to 
the initially-allocated capacity, then it can be maintained until the user wants 
to change the capacity again. However, if Vi> Ai, the VL generates the control 
packet periodically so that its capacity is dynamically changed according to the 
network status and CRS’s decisions. 
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Capacity Resizing Server(CRS) behavior 

Define two disjoint sets of VPN users /Ci and IC 2 as the users with 
and ri > Ai, respectively. CRS is responsible for keeping track of the residual 
capacity of the link and for distributing it to those VLs who want to change 
their reservations. It maintains three kinds of information: 

1. C, the available capacity of the link; initially, it is given hy C = C — Ai, 
and as VLs update their capacities, 

i iGK.1 

2. Ai for alH S /C 

3. The summation R of excess capacities of all VLs in /C2; i.e., if we let fi = 
ri — Ai for i G K- 2 , then 



^ = E /* = E 

i^K,2 i^K-2 

That is, fi is the amount of excess bandwidth above the initial allocation Ai, 
and R is the summation of fi for all VLs in /C2. □ 

When CRS receives a control packet, it compares Si in the packet with Ai 
and handles the request as follows. 

Case 1 If Si < Ai, then the request is always accepted regardless of the current 
reservations, and the information is updated by 

R = R — fi ■ I{i G IC2) and C = C + ((Ai A ri) — Si), 

where /(•) is the indicator function and aAb — min(a, b). Thus, if the previous 
reservation of VL i is greater than Ai, then both of R and C are updated; 
Otherwise, only C is changed. 

Case 2 If Si > Ai, the decision is made according to the value of Vi as follows. 
If fi > Ai, then a new capacity is assigned by 

Si = Si A {Ai + a{C — R + fi)}, (1) 

where 0 < a < I. The rationale behind the equation m and the role of a 
in distributing the residual capacity will be discussed in Section ??. Else if 
Ti < Ai, then first increase ri up to Ai and let C = max{0,C — (Ai — ri)}. 
Now we have the same case as the above and repeat the allocation by O- 
Thus the VL i received at least Ai and Si is determined by the network 
status. C and R are updated according to the change of ri. 

After the admission, or even during the process of the second case, the total 
reservation may exceed the link capacity such that X^ieiC ^ However, since 
VPN users in /C2 are assumed to generate the control packets continuously and 
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the excess capacity allocated to those users is distributed from the residual ca- 



Note that the control scheme described above is similar to the rate-based 
ABR traffic control mechanism in ATM networks 0 . The distinct aspect of our 
scheme is that the increase or decrease of VL capacity is motivated by users and 
the management entity accept or reject their requests according to the admission 
policy. In the ABR control, by contrast, the flow rates are controlled by the 
network, not by the users. Moreover, our scheme allows the users to reduce their 
reservations below their initial allocated capacities. 

3 Applying the Allocation Policy 

3.1 Analysis of the Allocation Policy 

In order to analyze the stability and property of the allocation policy described 
in the previous section, we only consider the VLs with Si > Ai. In this case, 
regardless of the current reservation r^, the VL capacity is resized by dU and 
decisions are made such that Si > Ai. Note that, if Si < Ai, then the allocation 
is deterministic and no algorithm is required. Thus, if C is given, the allocation 
problem is reduced to the case where the VLs in IC 2 compete for the excess 
capacity. In a real situation, the VLs and the residual capacity can be dynam- 
ically changed, and thus the allocation algorithm should be stable and fair to 
distribute the resource among the VLs. As the users in IC 2 are assumed to send 
control packets, fi is updated periodically as long as C changes. 

Now consider a system of optimization problems below: 



where Ui is a real function that is strictly concave with respect to fi and has its 
optimal at 



Each user in the system of (EJ are trying to optimize their utilities, and the 
optimal policy of a user given a constant C and other users’ policies is always 
determined by 031). The modeling and analysis of a system of optimizations 
applied in the area of communication networks was studied in |3|, and it has 
been known that the resource allocation problem such as the above has a unique 
equilibrium such that all users are satisfied at the point. 

Now consider our resizing scheme. If the CRS assigns fi according to OSJ 
and the constraint of 021), then it corresponds to the optimal policy of virtual 
link i. Thus, if the CRS determines fi for all users in K .2 in the same way, 
then it corresponds to the situation formulated in (E). According to the unique 
equilibrium property of the model, it follows that El has a unique solution which 



pacity C, that situation may exist only for a very short period of time. □ 



maximize Ui{fi,i € IC 2 ), Vi G /C 2 
subject to 0 < P < Si — Ai, 



(2) 




( 3 ) 
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is fair to all users in K 2 - Thus, the allocation is fair and stable as long as C 
remains constant. 

Moreover, the optimal solution can be achieved dynamically using the Gauss- 
Seidel type iterations)^; that is, the network handles users’ requests one at 
a time either synchronously or asynchronously. It has been known that if the 
some updates are made at the same time, then they might not converge to an 
equilibrium [ 7 |. However, in our scheme, even if the control packets from VLs 
arrive at the same time, the capacity updates are computed one at a time and 
they are guaranteed to converge. Also note that VLs need no knowledge on the 
link capacity and other VLs; As long as the control packet are delivered without 
an error, the transmission delay of the packets between the users and network 
management entity does not exert any bad influence on the convergence. 

We now analyze the capacity resizing achieved at the equilibrium. Assuming 
VLs in IC 2 are competing for a residual capacity C, let /* and F* be the allocated 
capacity for VL i and the total allocation from C at the equilibrium, respectively. 
The actual reserved capacity of VL i is then + /* . The following theorem 

tells how much capacity each VL can get out of C. 

Theorem 1. Let s' = s, — where Si, Si > Ai, is the desired capacity of VL 
i € IC 2 - If a residual capacity C is distributed with the parameter a, 0 < a < 1, 
then the capacity f* allocated to i G IC 2 at the equilibrium is determined such as 

r*i s' , if s' < FS 
^ \ FS , otherwise, 

where the unique point FS, a fair share, is given by 

FS=-^(C-F*). (4) 

1 — a 

Proof. By applying the Kuhn-Tucker conditions |B|, we have the following con- 
ditions for all i at the equilibrium of ( 0 : 

A* 0, A*(/* - s') = 0, M*/* = 0 (5) 

-a(C-^/;) + /; + A*-M* =0, (6) 

where A* and p,* are Lagrange multipliers. We have the following from 0 : 

A* > ^ A* > 0 ^ /* = s' 

Then (EJ can be written as 

1 — a 1 — a 

Now let FS = — F*), then from 0 and above implications. 



sr<FS ^ f* <FS ^ ^ f: = s' 
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The remaining case /* = FS can be proved by removing the users of the first 
case and computing the unconstrained equilibrium as in 0: Define A and IC 2 to 
be ^ = {i G /C 2 IS' > FS} and JC '2 = JC 2 — A, respectively. Further, let 

C' = C~Y^s[. 

i^A 



Then for i G JC' 2 , 



/* = 



a 



1 + a{K'2 - 1 ) 



C' = aC' - f*a{K '2 - 1), 



( 8 ) 



where K '2 = Summing up for all i G JC '2 yields F' = — F'a{K '2 — 1), 

where F' = ft- ^^^'2 ~ c^-F' equation and substitut- 

ing it into I0) completes the proof of the other case as follows: 



/; = 



1 — a 



{C' - F') = 



a 



1 — a 



(C-F*). 



□ 




Fig. 1. Dynamic updates of VL capacities: two VLs with constant available capacity 



3.2 Capacity Resizing in a Single Link 

In this section, we simulate the capacity resizing policy. Fig. Q shows the capacity 
resizing of two VLs. Initially the two VLs are provisioned with Ci = 50 and 
C 2 = 50. The situation assumes that the two VLs are competing for the residual 
capacity C = 100. They continues to send control packets and, after a number 
of interactions with the CRS, their reservations converge to the point where the 
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Fig. 2. Dynamic updates of capacity reservations: the available capacity varies 



summation of the excess capacities consumes C . In this example, the value of a 
is set to 0.6 so that some amount of link capacity is reserved for a possible future 
use. The algorithm converges dynamically even when the available capacity of the 
link varies with the time as in Fig.O The residual capacity varies between from 
C = 100 and C = 50, which is possibly resulted by other VLs’ capacity updates. 
The two VLs dynamically resize their capacities, and the total reservation follows 
the capacity changes fast. In a real situation, the changes of VL capacity may 
be in effect some time later due to transmission delays. The link is almost fully 
utilized with a = 0.8 in this example. Note that, in the above figures, the total 
reservation never exceeds the link capacity in spite of the fact that the CRS 
handles a user’s request instantaneously. 

4 Multiple Link Case 

In this section we discuss the extension of the capacity resizing scheme. Gen- 
erally, the virtual backbone of a VPN user consists of a number of multi-hop 
virtual links on the provider’s network. Each VL of the virtual backbone has a 
provisioned capacity which is allocated at the time of establishment of the VPN. 
As in the single link case, the VPN user may wants to increase or decrease the 
capacity of its virtual backbone. It then generates a control packet and sends 
it to the CRS, which decides the new capacity for the VL and send back the 
control packet. In this case, however, CRS maintains the list of links that each 
VL traverses. Again, any request of st < Ai from a VL is always accepted, and 
if Ti > Ai after the resizing, the control packets should be generated periodically 
for dynamical resizing. 

Let V be a finite set of nodes and £ C V x V be a set of links whose elements 
are unordered pairs of distinct nodes. Ci is the bandwidth of a link £ G £, and 
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Rg is the summation of the excessive capacities of the VLs traversing the link. 
VL i has a set C C of links that it is associated with, and reserves capacity 
Vi through all links in Ci. The set of VLs which use link £ can be defined by 
ICi = {i€lC\i€Ci}. Then, ICn and ICi2 are the VLs in ICg with < Ai and 
ri > Ai as in the single link model. We say that link ^ is a bottleneck link for 
a VL i traversing i ii Cg — Rg + fi < C'g — Rg + fi for every link £' that VL i 
traverses; i.e., a bottleneck link of VL f is a link that has the smallest capacity 
available in Ci. The CRS maintains and updates the following four information 
on the network status: 

1. Cg, the available capacity of link £; initially, it is given by Cg = Cg — 
i € Kg, and as the VLs update its capacities, 

Cg = Cg-Y,A^+Y.{A,- n), i e Kg 

i 

2. Ai for alH G /C 

3. Rg\ If we let fi = for i G Kg 2 , then Rg = J2i(giCe2 

4. Ci for all i € K 

The admission policy of the CRS described in the previous section can be 
applied to the multiple link case with a small modification as follows. When the 
CRS receives a request from VL i, it looks up the list Ci and identifies a link £ 
whose Cg is smaller then or equal to other links in Ci. It is apparent that such 
an £ always exists, and CRS performs the capacity allocation at the link £ for 
the VL i as in the single link case: 

Case 1 If Si < then the request is always accepted and the information is 
updated by - /i • J(i G /C^ 2 ) and Cg = Cg + {{Ai A n) - Si). 

Case 2 If Si > Tli, the decision is made according to the value of Vi as follows. 
If Ti > Ai, then a new capacity is assigned by 

Si = Si A {Ai + a{Cg — Rg + fi)}, (9) 

where 0 < a < I. Else if Vi < Ai, then first increase Vi up to Ai and let 
Cg = max{0, Cg — {Ai — n)}. Now the capacity resizing is performed by ® . 
Cg and Rg are updated according to the change of ri. □ 

As a VL that traverses a number of links has the same reservation ri along 
the links, CRS may consider only a bottleneck link of the VL. That means the 
computational complexity of the allocation policy is independent of the number 
of links in the network or in a VL; thus, even when the size of a network grows, 
the work load of CRS remains the same. We simulate the policy for the example 
network shown in Fig. Q which consists of four gateways connected via three 
links. Initially, The first and second links have the capacity of 100 and the third 
one 200. Starting from time 0, the capacity of the second link varies between 
150 and 200, resulting the residual capacity between 50 and 100. Four VLs are 
established as in the figure, and the initial capacities of the VLs are provisioned 
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Virtuallink4 Virtual link 3 




Fig. 3. Example network of four switches on which four virtual links are established. 



as 20, 40, 40 and 60, respectively. Assuming that all four VLs want to increase 
their temporal reservations, as residual capacity of the second link varies, the 
three VLs which traverse the link may have chances of resizing; however, as VL 3 
is constrained by the first link, its capacity can not be resized even if the second 
link is underutilized. Thus only the requests from VL 1 and 2 are eligible in 
this case. Fig. 0] shows the result of capacity resizing procedure at the example 




Fig. 4. Capacity resizing of three virtual links at the second link 



network. Two graphs at the top depict the capacity changes of the second link: 
Note that this changes represent the available capacity on the link, possibly, 
due to the capacity resizing of other VLs which are not shown in the network. 
The lower part of the figure shows the capacity updates of three VLs which 
traverse the second link. As the residual capacity varies between 50 and 100, the 
capacities of VL 1 and 2 are equally fairly increased, consuming the available 
capacity. In this simulation, the value of a is set to 0.6. In this figure, virtual 
link 3 keeps sending the control packets hoping to increase its capacity, however 
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since its bottleneck link is the first link which has no residual capacity, CRS 
always sends back the requests with Si = Vi and the capacity does not change. 

5 Conclusions 

In this paper, we have proposed a new scheme for dynamical resizing of VL 
capacity based upon the users’ requests. A VPN user establishes a set of VLs 
each of which has an initial capacity. During operation, VPN users may resize the 
capacity of a VL by sending a request to the CRS. They decrease the capacity 
possibly due to a monetary incentive, and increase the capacity to accommodate 
the temporal traffic increase. If the set of links along which a VL traverses have 
some residual capacity and thus the request is eligible, then the CRS computes 
a new capacity for the VL and returns back the request. We have presented that 
our resizing policy is simple and robust such that residual capacity is distributed 
fairly among competing VLs and it converges fast for a varying link capacity. 
Moreover, the scheme is scalable in the sense that it is independent of the size 
of a network and the number of hops of a VL. 

Our scheme investigates the problem of fair and efficient allocation of resid- 
ual link capacity among only the competing users, not the problems of how to 
provide a bandwidth guarantee or how to match a user’s requirement. Several 
issues should be addressed further: The frequency of sending control packets to 
CRS and its impact on the performance of the network was not considered in 
this paper, and as the VLs may keep sending control packets even when there 
is no or a little residual capacity, a mechanism should be investigated such that 
the system becomes stable in such cases. 
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Abstract. A new approach, random leader-based protocol (RLBP), is presented 
to overcome the problem of feedback collision in single channel multi-access 
wireless LANs for acquiring additional channels and reliability. It involves the 
selection of a leader in a multicast member group that acts as a representative 
for sending feedback to the sender using random delayed time. The leader sends 
nothing but acknowledgement (ACK) messages. Eor reliable multicasting, on 
erroneous reception of packets, the leader sends nothing resulting the receiver 
to prompt the retransmissions. Eor erroneous packets reception at receivers 
other than the leader, our protocol allows negative acknowledgements (NAK) 
from these receivers to collide with the acknowledgement from the leader 
(assuming that leader receives correct packets), thus destroying the 
acknowledgement that prompts the sender to retransmit the packets. Using an 
analytical model, it is shown that the proposed protocol obtains higher 
throughput than the delayed feedback-based protocol and leader-based protocol 
especially under lossy consideration. 



1 Protocols to Avoid Feedback Collision 

In multicast communications for microcell-based wireless network supporting mobile 
terminals, the base station communicates with a group of receivers. All 
communication is either directed towards the base station or directed away from the 
base station (base-to-terminal or base-to-terminals in the case of 
multicasting/broadcasting). Time is measured in terms of a basic unit called the slot. 
Thus, time evolves in discrete steps: 1st slot, 2nd slot, ..., nth slot, and so on. System 
events, like transmission/reception of a packet, occur at integer-valued slot times. It is 
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important to ensure that all of the entities in a cell — the base and the terminals — 
identify the beginning and end of slots unambiguously and simultaneously. This is the 
synchronization problem and for the purposes of this paper, we assumed that perfect 
synchronization is achieved. There are significant differences between wired and 
wireless LAN transmission media, which makes it impossible to use traditional wired- 
LAN MAC strategies like CSMA/CD to wireless LANs. In a multi-access wireless 
LAN, collision detection is not practical. This is because the dynamic range of the 
signals on the medium is very large, so that a transmitting station cannot effectively 
distinguish incoming weak signals from noise or its own transmission [1]. In order to 
prevent bandwidth loss due to collision detection (possibly due to an ACK/NAK) 
after the entire packet has been transmitted, a transmitter needs unambiguous and 
conclusive evidence that it has acquired the channel before starting a transmission. In 
the wireless context, this evidence can be provided by means of a handshaking 
mechanism implemented using short fixed-size signaling packets: Request-to-Send 
(RTS) and Clear-to-Send (CTS)[2][3]. 

We now briefly describe the RTS-CTS mechanism for unicast transmissions. 
When a base or a terminal transmits, it sends an RTS packet to the intended recipient. 
This RTS packet contains the length of the proposed transmission. If the recipient 
hears the RTS, it replies immediately with a CTS. The CTS also contains the length of 
the imminent data transmission. Upon hearing the CTS, the initiator goes ahead with 
the transmission. Any terminal overhearing an RTS defers all transmissions for an 
interval sufficient for the associated CTS to be sent and heard. Any terminal 
overhearing the CTS defers for the length of the oncoming data transmission. After a 
data packet is received, the recipient provides link-level ARQ feedback, by means of 
an ACK. The RTS-CTS mechanism also helps in combating the hidden terminal 
problem. When a transmitter about to transmit senses no carrier in its vicinity, it 
cannot conclude that the shared channel is unused, because another transmitter hidden 
from it may be transmitting at that instant. With the RTS-CTS mechanism, the hidden 
terminals can hear the CTS and defer using the channel. In this paper, we considered 
that all terminals in a cell are within range of one another and the base station. All 
terminals have a consistent view of what is going on in the cell and that there are no 
hidden terminals. A discussion on the impact of hidden terminals on our work and the 
means to deal with it can be found in [4]. The IEEE 802.11 Media Access Control 
standard uses the RTS-CTS exchange. It is important that the RTS-CTS control 
structure be retained when multicast functionality is overlaid. Consequently, when 
adding multicast functionality, we devise ways of extending the access control 
mechanism rather than modifying its basic structure. 

While the RTS-CTS mechanism, described above, for coordination access to the 
channel and supplying link-level ARQ feedback works well enough for unicast 
transmissions, it runs into problems straight away in the context of multicasting. With 
the above protocol, each of the members in a multicast group would respond with a 
CTS to a multicast-RTS from the base, leading to a CTS collision at the base. A 
similar collision problem can also be expected with respect to the feedback ( ACK or 
NAK) provided by the link-level ARQ mechanism. Standard probabilistic approaches 
can be used to manage the CTS collision problem. In the delayed feedback scheme. 
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terminals hearing a multicast-RTS send a CTS with a random delay, hoping to avoid a 
CTS collision. We will also consider protocols based on this idea. To tackle the 
ACK/NAK collision problem, a contention-based approach is possible, where 
receivers contend for the channel to send feedback. However, the contention-based 
approaches suffer from problems of their own, as will be seen in subsequent sections. 
This motivated us to develop a new protocol that is leader-based that satisfactorily 
addresses these specific problems. 



1.1 Protocols 

In this paper, three generic protocols are introduced.. Those are delayed feedback- 
based protocol [5], leader-based protocol [5] and the proposed random leader-based 
protocol. In reliable multicasting over a multi-access wireless LAN environment, all 
of these protocols are for a single sender, base- station, sending reliably to a group of 
receivers within a cell. We assumed that the basic support for link level multicasting, 
such as the link level multicast address, is available at both the base-station and the 
receivers. The receivers that subscribe to the multicast address are considered to 
belong to the multicast group corresponding to that multicast address. 



1.1.1 Delayed Feedback-Based Protocol [5] 

In the delayed feedback-based protocol, the CTS collisions are sought to be avoided 
using a random timer. This protocol, termed DBP, is specified as follows: 

[A] Base Receivers (Slot 1) 

1. Send multicast-RTS. 

2. Start a timer ( timeout period T), expecting to hear a CTS before the 
timer expires. 

[B] Receivers//" Base 

1. On hearing RTS, start timer with an initial value chosen randomly from 
{1,2,. ..,L}. 

2. Decrement timer by 1 in each slot. 

3. If a CTS is heard before timer expires, freeze timer (CTS suppression). 

If no CTS is heard before timer expires, send CTS. 

[C] Baser Receivers 

If no CTS is heard within T, back off and go to Step A. 

If a CTS is heard within T (at a random time), start data transmission. 

After completing transmission, prepare to transmit the next packet and go to Step A 
(no waiting for feedback). 

The next step is executed only when a multicast transmission occurs in step C. 

[D] Receivers // Base 

If a packet is received without error, do nothing. 

If an error occurs, contend for the channel to send NAK. 
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1.1.2 Leader -Based Protocol [5] 

The Leader-based protocol assumes that one of the receivers of the multicast has been 
chosen to be a leader for the purpose of supplying CTS and ACK with a response to 
RTS and data packets of length h respectively. The leader-based error recovery 
protocol (LBP) is specified as follows: 

[A] Base Receivers (Slot 1) 

Send multicast-RTS. 

[B] Receivers z/’ Base (Slot 2) 

Leader: If ready to receive data, send CTS. 

If not ready to receive data (e.q., due to insufficient buffers), do nothing. 

Others: If ready to receive data, do nothing. 

If not ready to receive data, send NCTS (not clear to Send). 

[C] Base y' Receivers (Slot 3) 

If a CTS was heard in slot 2, start multicast transmission. 

If no CTS was heard in slot 2, back off and go to Step A, 

The next step is executed only when a multicast transmission occurs in Step C. 

[D] Receivers ^ Base (Slot (Ia-3)) 

Leader: If a packet is received without error, send ACK. 

If an error occurs, send NAK. 

Others: If a packet is received without error, do nothing. 

If in error, send NAK. 

LBP uses both ACKs and NAKs from receivers as feedback to the sender. It makes 
an interesting use of collisions associated with one or more NAKs to ensure that the 
sender does not get a positive feedback if one or more group members received an 
erroneous transmission. 



1.1.3 Random Leader-Based Protocol 

We now provide our random leader-based protocol for reliable multicasting over a 
multi-access wireless LAN. Both ACKs and NAKs are used to provide reliable 
transmissions. When the first multicast-RTS packet is sent from the base station, the 
DBP method is used to get the CTS packet from any of the member receivers. The 
timer for that receiver is then set to 0. The receiver then becomes a pseudo leader or 
named as random leader. 

The RLBP is specified as follows: 

[A] Base Receivers (Slot 1) 

1. Send multicast-RTS. 

2. Start a timer (timeout period T), expecting to hear a CTS before the 
timer expires. 

[B] Receivers/**^ Base 

1. On hearing RTS, start the timer with an initial value chosen randomly 
from { 1,2,...,L}. 

2. Decrement timer by 1 in each slot. 

3. If a CTS is heard before the timer expires, freeze the timer (CTS 
suppression). 
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If no CTS is heard before the timer expires, send CTS. 

4. If a CTS is sent, set the timer=0 when the next packet comes. 

[C] Baser Receivers 

If no CTS is heard within T, back off and go to Step A. 

If a CTS is heard within T (at a random time), start the data transmission. 

After completing the transmission, prepare to transmit the next packet and go to Step 
A (no waiting for feedback). 

The next step is executed only when a multicast transmission occurs in step C. 

[D] Receivers r Base 

If the packet is received without error, send ACK. 

If an error occurs, contend for a channel to send NAK. 

Compared to the LBP, the base in the RLBP scheme does not need a complex 
mechanism to maintain the leader table or refresh it whenever the leader 
leaves. 



1.2 Discussion 

In comparison to the LBP, a successful RTS-CTS exchange would take longer in both 
DBP and RLBP. This is because DBP has to deal with the possibility of CTS 
collisions, as well as RLBP. RLBP takes even longer time for the first packet of a 
multicast transmission. As DBP and RLPB are NAK-based, a packet must be 
maintained longer to ensure that most of the retransmission requests can be serviced. 
At the receiver, a greater buffer is required to buffer out-of-order packets so that the 
upper layers receive an ordered delivery. Another problem with DBP is the choice of 
the right parameters for waiting times and the feedback probability. This choice is 
dependent upon the number of group members. The group members are not likely to 
have an estimate of the group size. It is possible for the sender to do this estimation 
and send out the right parameters with the RTS to prevent complex estimation 
mechanism implementation. A difficulty arises if the leave-group message sent by the 
leader, leaving group Gi, is not heard at the base station. The base station will 
wrongly believe that Gi has a leader although the leader has already signed off. In this 
case, when the base sends out a multicast-RTS for group Gi, it will hear no CTS. 
After several unsuccessful attempts, the base will erase the leader entry corresponding 
to Gi, and stop forwarding packets addressed to this group. The LBP and RLBP, in a 
loss-free channel, have similar performances. When we consider the packet loss 
situation, the throughput will be different. We will discuss this in the next section. 



2 Performance Study 

In this Section, we compare the performance of LBP, DBP and RLBP. We consider a 
scenario where multicast traffic is the only traffic present in the cell and whether the 
control packets will be lost. Time is measured in terms of a basic unit called the slot. 
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The basic criterion used for studying the performances of these three protocols is 
the channel holding time associated with a tagged data packet. We can take a measure 
of the throughput using this criterion. 

We consider two cases in this chapter. The first is error-free and the second is the 
retransmission requirement. The error-free channel is an idealized case, since all 
packets are received correct. In the second case, we derive a lossy model to Illustrate 
the retransmission requirements. 

2.1 Error-Free Transmission 

In DBF, a receiver hearing a multicast-RTS from the base starts a timer with a value 
chosen at random from the set {1, 2, ..., L}. We assume that the value L is made 
available to the receivers by this case; for example, it may be carried in a field in the 
RTS packet. The receiver whose timer expires sends a CTS. Upon hearing the CTS, 
the other receivers, whose timers have not yet expired, suppress their own CTSs. A 
CTS collision occurs if two or more receivers happen to choose the same initial value 
for their timers. 

Since the receivers send the CTS after a delay, the base must wait for some time to 
hear the CTS. This is the base’s timeout period for T slots. If a base does not hear the 
CTS within time T, it assumes there was a collision, and tries again. We choose T<L. 
This is because if T is large, then a lot of time is wasted before the base times out. On 
the other hand, choosing a moderately large L helps in avoiding a CTS collision 
within T. 

If we give the number of receivers N, L and T, the probability that the base hears a 
CTS within time T, called p^, can be expressed as follows; 




2 
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Fig. 1. Variation of p^ with L, keeping N and T fixed. 
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In Figure 1, we show how varies with L, when N and T are fixed. In all cases we 
find that first increases, hits a peak and then decreases as L is increased. When L is 
small, the chances for CTS collision increase. When L is large, the chances for no 
receiver to send a CTS within the timeout period T increased. The best values for p^ 
are therefore found in the middle. 

In order to create a situation favorable to DBF, the following assumption is made: 

Assumption S: If no CTS is heard within the timeout period T, the base does not 
back off. 

Under this condition, we ask the question: on the average, how long does the base 
spend in the access period? 

Let be the random variable representing the total time spent by the base in 

the access period, measured from the instant when it is ready to send the first RTS. 
We assume that it takes 1 slot to transmit the RTS or any other control packet. Let A 
denote the event that the base hears a CTS within T slots of sending the first RTS, and 
A denotes the complementary event. Then we have 

{ 1 + T 

if A occurs 

(1 + T) + ^ 

when t <T is the time at which the CTS is heard, if A occurs, and, W^ is the time 
spent in the access period after the first timeout. 

Now the distribution of W^ is the same as the distribution of , and according 

to Prob(A)=p^, we obtain 

£(r^^^) = £(T/A)+ (i~^") r+— 

Ph Ph 

In figure 2, we presented some examples of how E( ) varies with the 
parameters L and T. The number of receivers, N, is chosen to be 30. For a fixed T, 
E( ) first decreases, reaches a minimum and then increases again as L is 



increased. This is because E( ) is high when p^ is low and vice versa. 
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Fig. 2. Comparison of the expected time of DBF [4] 



In the roaming scheme, all members may roam to another cell using a random 
probability. Since every CTSs is sent using a random delay mechanism whenever 
multicast-RTS comes, it is not necessary to discuss the roaming probability. 

In the LBP protocol, after hearing the multicast-RTS, a receiver sends a CTS in 
the next slot with probability p. The base waits for 1 slot after sending the RTS. If 
exactly 1 member happens to reply, the access period is complete. If the base does not 
hear a CTS, it then has to restart the process by sending the multicast-RTS again. 

The minimum time spent in the access period is 2 slots, 1 to send the RTS and 1 to 
hear the CTS. Let p„be the probability that the access period lasts 2 slots. Then we 
can get 

p^=Np{l-py-^ 



under Assumption S, the number of attempts necessary for the access period to be 
complete is geometrically distributed with parameter p„. Hence the mean time spent in 
the access period, ), is given by 2/p„. To minimize this time we choose p so 

that p„ is maximized. This is achieved for p=l/N, giving the following expression for 
the mean time: 






2 



The N value can be transmitted to the receivers from the base in a field of RTS 
packets, for example. 

Considering the roaming scheme, when the leader roams to another cell, the base 
station must choices one of the other members as a leader. This mechanism requires 2 
slots. Assume that the roaming probability is p^, and the mean time for LBP is: 



E(Tr) = 



2 



+ 2Pr 
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Finally, we consider RLBP protocol. Although there is a fake leader, the RLBP 
leader was chosen using a random del^ mechanism. If there is no leader, the mean 
time spent in the access period, Ef T ) , is equal to the mean DBP cost time. If 
there is a fake leader, the mean time spent in the access period Ej( 7^ ) is equal to 

the mean EBP cost time. Considering the roaming scheme, if the fake leader stays in 
the same cell, when the base station sends a multicast-RTS, the fake leader will 
immediately send a prepare to receive data message. The other members will be quiet 
after hearing a CTS. If the fake leader roams to another cell, there is no longer a 
leader. All of the members will start a random back mechanism to send a CTS reply. 
The receiver which sent a CTS will become the fake leader and the delay timer will 
be set to 0. 

Assume that the roaming probability is p,^, the mean time cost of RLBP is: 

) = a-Pr )E(Tr ) + 

Hence, 

2 (1 - ) 1 Cost Under DBP, LBP and RLBP 

£(C ) = {!-/>, Xt — + Pr(E(r/ A) + E_L±L t + —) 

Consider DBP, when the channel is error-free, no NAKs are necessary because no 
packet is received in error. The base station then transmits a multicast-RTS and waits 
for the timeout period T to hear a CTS. After several possible attempts, the base hears 
the CTS and transmits the packet. 

We focused on a tagged packet and considered the mean time required to transmit 
a packet. Including the time spent in the access period. We consider this time to be the 
cost associated with the tagged packet. The cost to transmit a packet gives a measure 
of the efficiency of the protocol. Let the data packet transmission time be C slots. 
The cost of a packet under DBP is then £(7’^™^ -I- C) • 

Consider the events on the channel under LBP, a packet transmission is preceded 
by 2 slots: 1 for the multicast-RTS, immediately followed by an ACK packet which 
also occupies 1 slot. Thus, the cost of a packet transmission under LBP is : (CH-3H-2p^). 

The events on the channel under RLBP, when fake leader roaming occurs, the 
RLBP cost is equal to the DBP cost, otherwise the cost is equal to the LBP cost. 
Assume that the roaming probability is p^, the cost of a packet transmission under 

RLBP is : d-p,)(C+3)+pXEr^''*'’) . 

Assume that it takes 20 slots to transmit a data packet (C=20), and drive the 
roaming probability at 0% and 40%, we obtained figures as follows (Eigure 3(a)-(b)). 
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Fig. 3. (a)Cost under roaming probability 0%, (b) Cost under roaming probability 40% 

From figures 3-4, we can see that the cost of LBP and RLBP are very close in an 
error-free channel but better than the best performance achieved by DBP. 



2.2 Lossy Channel 

When the channel is lossy, packets are received in error and retransmissions are 
required. In a lossy channel, control packets like RTS, CTS, ACK, NAK may be lost 
because of collision or other problems. In DBP, since the CTS packets are not sent by 
any particular receiver, the mean time cost will not be changed in DBP. In LBP, when 
the leave message sent by the leader is lost, the base station will try several times with 
multicast-RTS to prepare to send data. After several unsuccessful attempts, the base 
will erase the leader entry corresponding to that group, and stop forwarding packets 
addressed to that group. If there are other group members that are still interested in 
this group, they will eventually time out and start the subscription process for the 
group again. This mechanism may require much time to produce a new leader. 

The RLBP protocol does not maintain a group-leader table at the base. The base 
will not expect a CTS packet from a particular receiver. When a fake leader leaves 
without acknowledgement, the other receivers will not hear a CTS packet from one 
another. A random delay process will then begin to decide a new fake leader. When 
packet loss occurs, the mean time cost will equal DBP. When no packet loss occurs, it 
will be equal to LBP. 

We set the loss probability (LP) as 5% and 10%, and cost under roaming 
probability (CRP) as 10% and 40%. We obtained the following figures. (Figure 4(a)- 
(d)) 

From these figures, we can see that the LBP and DBP are close under high 
roaming, lossy models. This is because when packets are lost, the LBP protocol must 
spend more time to rebuild the multicast tree. The greater the loss probability, the 
greater the time spent. When the number of members becomes larger, the time spent 
in rebuilding the multicast tree will increase. We can see that the RLBP has more 
efficiency when the roaming probability and loss probability become higher. This is 
because the RLBP protocol does not need to rebuild a multicast tree if the base 
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receives no CTS messages for a long time. It will automatically pick another one in 
the group as the leader. 







Fig. 4. (a) CRP =10% LP=5% (h) CRP=40% LP=5% (c) CRP=10% LP=10% (d) CRP=40% 
LP=10% 



3 Conclusions 

In this paper, the proposed protocol allows responses from a pseudo leader to avoid 
collision possibilities, and decrease the problems caused by packet loss. The base 
station does not need much time to wait for CTS messages. The RLBP protocol 
provides very efficient solution to the CTS and ACK/NAK collision problem. 
Compared to the LBP protocol, RLBP does not waste time when control messages are 
lost and it is not necessary to choose a receiver to transmit the CTS with a random 
delay mechanism. In addition, this method is very simple to implement and can be 
integrated easily into the current wireless LAN standard. A group-leader table is not 
maintained at the base, so the base will not expect a CTS packet from a particular 
receiver. For the LBP and RLBP methods in a loss-free channel, the performances of 
these two protocols are similar. Under packet loss environments, the RLBP protocol 
is prevailing. 
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Abstract. The availability of wireless technologies such as HomeRF, 
WaveLAN and Bluetooth have propelled the demand for their use in the office, 
home and public spaces. These technologies can provide wireless multimedia 
services. QoS for these services can be provided by means of a MAC level 
QoS or a network level QoS. In this paper we describe the implementation of a 
network level QoS for the 802.11b technology by using the Windows 2000 
Operating System. The testbed is used to obtain throughput of UDP flows 
created by applications such as iperf and qtcp for different number of active 
stations, packet sizes and packet loss. These performance evaluation results are 
validated by means of stochastic simulation of the testbed by using GloMoSim. 



1 Introduction 

The increase in computer processing power and availability of high-bandwidth 
communication networks has fuelled the demand for networked multi-media desktop 
communications. Users of broadcast television and telephone networks have the same 
level of expectancy for Quality of Service (QoS) from networked multimedia 
services. In order to meet such quality of service expectations the IETF has been 
developing a number of mechanisms for quality of service delivery. These include 
IntServ and DiffServ as well as associated protocols RSVP, RTP and RTCP [1-2]. 
There is currently a lot of thrust in the information technology industry to provide 
services using wireless networks, exemplified by the arrival of new and more cost- 
effective technologies such as WaveLAN, HomeRF and Bluetooth. These 
technologies provide mainly asynchronous mechanisms for data delivery, although 
some also provide isochronous channels for delivery of time sensitive services. 

Wireless technology based on the 802.1 1 standard is currently a popular choice for 
data delivery in wireless LAN environments. Although, the standard allows for 
support of isochronous channels, in practice this has not been widely implemented. 
There is also some activity to extend the current standard to support QoS services by 
extending the capabilities of the MAC layer [3]. However, such extensions will 
increase the complexity of the MAC layer and this in turn may increase the cost of the 
technology. Rather than providing QoS support at the MAC layer, a more robust and 
effective approach may be to provide such support at the network layer. Such an 
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approach would not depend on the existence of a QoS-aware MAC layer and could 
also he applied to existing wireless LAN technologies. A number of operating system 
developers have provided QoS features in their operating systems. Microsoft, for 
example, has provided extensive QoS support in its Windows 2000 operating system 
and limited QoS support in Windows 98 operating system. The QoS functionality 
provided by Microsoft is based on the RSVP signaling protocol that conveys the 
reservation requests made by the receiver to the admission control server. Some work 
in the provision of QoS mechanisms is also being done in Linux [4-5] and free BSD 
[6]. However, at the moment these operating systems are not completely developed 
from the QoS perspective. In addition, their QoS functionality can be accessed only at 
the kernel level. Based on the availability and functionality of QoS features in various 
operating systems we have chosen Windows 2000 as the basis for the implementation 
and testing of network-level QoS support in our wireless LAN testbed. 

In this paper, we describe our efforts in the implementation of a network-level QoS 
mechanism in a wireless LAN testbed based on the 802.11b technology. The 
performance of the testbed is evaluated by means of simulations and testbed 
measurements. The results obtained from these tests are then used to evaluate the 
efficacy of network-level QoS for providing QoS support in shared medium wireless 
networks. Section 2 presents an overview of the wireless LAN testbed. Section 3 
shows our simulation and testbed measurement results. Section 4 describes the testing 
of the suitability of network-level QoS support. Section 5 presents our conclusions for 
this work. 



2 Wireless LAN Testbed Overview 



Figure 1 shows a high-level representation of the wireless LAN testbed. 




Fig. 1. Topology of the Wireless LAN Testbed 














722 V. Mirchandani and E. Dutkiewicz 



The testbed consists of a number of Windows 2000 professional operating system 
based clients and a Windows 2000 server operating system based server. The PC 
machines in the testbed are connected wirelessly using ORNICO cards which use the 
802.1 lb protocol for medium access. The ORNICO cards are configured to operate in 
the adhoc mode allowing more flexibility in topology configuration. 

TCP/IP and packet scheduler modules must be installed on the clients to enable 
their QoS functionality. The purpose of the packet scheduler is to schedule the 
packets to meet the flow requirements. The Admission Control Service (ACS) 
installed in the server offers logical reservation of the resources by consulting the 
policies entered by the network administrator. The ACS server stores the flow policies 
for each of the flows to be created between the sender and receiver. The policy setup 
mechanism offers granularity in the setup by means of a hierarchical QoS policy setup 
mechanism. 



3 Performance Evaluation of Wireless LAN 

The main aim of our performance evaluation studies was to determine the limitations 
of the wireless LAN to support real-time IP streaming applications. In this study we 
have concentrated on simple applications which do not have the ability to control its 
transmission rate in response to network conditions. Due to small MAC buffer sizes 
in the wireless LAN and the lack of rate control at the source; packet loss is expected 
to be the critical measure of quality of service for such applications. 



3.1 Simulation Set-Up 

The simulation model of the wireless testbed is shown in Figure 2. 




Node N MAC Buffer Channel 

(802.11b) 



Fig. 2. Simulation Model for Capacity Limit Determination in Wireless Testbed 
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In order to determine capacity limits of the wireless testbed we simulated it using 
the GloMoSim package developed by the University of California, LA. We used the 
freely available GloMoSim Version 1.2.3 [7]. Each node can transmit a flow of IP 
packets. IP packets are buffered in MAC layer buffers while contending for access to 
the shared wireless channel. In all our tests we used UDP flows. UDP packets may be 
lost due to the MAC buffer overflow during periods of channel contention. In the 
simulation model we were interested in determining the maximum allowable 
transmission rate from each node as a function of active nodes given a particular QoS 
constraint in terms of packet loss. 

Our wireless network simulations used the following base parameters: 
channel capacity rate = 1 1Mbps 
DSSS physical layer 

GBR UDP flows with fixed packet payload sizes 
RTS/CTS option disabled 
MAC queue size = 25 frames 



3.2 Capacity Limits with QoS Constraints 



Figure 3 shows the 3D admission region for 1% packet loss. 




Fig. 3. Maximum Allowable Transmission Rates in Mbps for 1% Level of Packet Loss 
Constraint. 



Table 1 shows the maximum available transmission rates for the case of 3 nodes using 
3 different IP packet sizes. Each entry in the table shows the sum of the transmission 
rates from the 3 nodes assuming that the 3 nodes are transmitting IP packets at an 
equal rate. This is expected to be a conservative capacity measure due to the higher 
level of channel contention compared to scenarios in which transmitting rates are not 
equal. The maximum packet size of 2000 bytes was chosen to be below the threshold 
for 802. 1 1 MAC frames in order to avoid IP packet fragmentation. 
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Table 1: Maximum Available Transmission Rates for 3 Active Nodes as a Function 
of Packet Payload Sizes for Two Different Packet Loss Constraints 



IP packet 
Size (bytes) 


1 % packet loss 
constraint 


10% packet loss 
constraint 


512 


4.470 Mbps 


4.920 Mbps 


1460 


7.400 Mbps 


7.800 Mbps 


2000 


7.800 Mbps 


8.550 Mbps 



The simulation results in Table 1 show the expected behavior: the maximum available 
transmission rates increases as the packet loss constraint becomes less stringent and as 
the packet size increases. 

The following recommendations are made based on the simulation study : 

The above simulation measurements have been obtained for fixed size packets 
but actual applications will generate traffic with variable packet sizes. Therefore, 
we need to obtain capacity limits given particular packet size distributions. 
Admission control policies should take into account capacity limitations. 



3.3 Testbed Measurements 

The wireless testbed of Figure 1 was implemented and capacity measurements were 
obtained from it. The main motivation for obtaining the measurements from the 
testbed was to validate the trend observed from the simulation results given in table 1 . 

Two tools for measuring capacity limits in a WLAN which we have investigated 
were Qtcp from Microsoft and Iperf from [8]. Both these tools provide a means to 
generate UDP packets. However, they cannot provide packet loss measurements since 
they use a blocking mechanism which reduces the UDP transmission rate to avoid 
packet loss. As such using these tools can only give approximate results of allowable 
transmission rates. 

In our tests we used iperf since it has a better user interface. The tests using 1460 
byte long packets indicated that the maximum capacity of the WLAN is around 6 
Mbps With 512 byte long packets this capacity was reduced to around 3.5Mbps. 
These rates are well below those obtained from our simulations. The discrepancy can 
be attributed to the fact that the simulation model assumed an error-free environment 
and a fixed transmission rate of 11 Mbps from each node. On the other hand, an auto 
transmission rate mechanism is implemented in the Lucent driver and it most likely 
adjusts the transmission rate between 5.5Mbps and 11Mbps, depending on the 
prevalent radio interface conditions. 



4 QoS Support Testing in Wireless LAN 

In order to confirm the working of the QoS mechanisms in our testbed it was 
necessary to use QoS enabled applications. Although, the Windows 2000 operating 
system has QoS capabilities to our knowledge there are currently only two 
applications that are QoS enabled: Qtcp and Netmeeting. Qtcp is a command driven 
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utility whose main objective is to determine the end-to-end packet delay and packet 
loss in the test network. The utility allows the user to generate a stream of packets of 
different traffic types, packet sizes and transmission rates. It also allows the user the 
choice using RSVP's reservation signaling. Correct operation of the QoS mechanism 
was verified for the Netmeeting application. The Netmeeting application produces 
three streams - RSVP signaling, a video stream and an audio stream. The video 
stream is of the controlled load service level, audio stream conforms to the guaranteed 
service level and the RSVP signaling is done by using the best effort service level. 
The QoS policies were set in the Admission Control Server (ACS). 

The ACS plays a pivotal role for the QoS provisioning in the Windows 2000 OS. 
The QoS ACS is configured with policies that reflect the QoS requirements of the 
enterprise, with additional policies created for groups or individual users. The QoS 
ACS must be the member of the same domain as the subnet it intends to manage. This 
involved first creating a managed subnet and assigning the ACS managing the subnet. 
Then the QoS ACS policies for the subnetwork constituting the testbed were set. 
After this the policies specific for individual nodes were set. The policies are spread 
over two general parameters - Flow limits and Aggregate Limits. The flow limits and 
aggregate flow limits encompasses data rate, peak data rate and duration of the flow 
parameters. 

The QoS ACS policy was set to offer unlimited bandwidth, peak data and 
unlimited resource to the Netmeeting application. The messages in the ACS log and 
the RSVP log files of the ACS confirmed that the reservation requests messages made 
by the receiving entities in both the clients had succeeded. When the bandwidth 
parameters in the policy for the clients was decreased to limit the flow of Netmeeting 
streams to 5kb/s the traffic monitor utility indicated that the Netmeeting audio and 
video streams had changed to best effort service level. Similar behavior as the 
Netmeeting application was observed for the Qtcp application which confirmed that 
the Qtcp was a QoS enabled application. 



5 Conclusion 

We have implemented and demonstrated the operation of a QoS mechanism on a 
wireless testbed that used the Windows 2000 OS Professional and server based 
machines. A novel method based on table look up procedure is proposed for setting 
up the policies dynamically in the admission control server. The look up table was 
obtained by using GlomoSim simulation tool. The practical results show that the 
WaveLAN can have a maximum throughput of around 6 Mbps and this affects the 
accuracy for validating the simulation results. Even then the results obtained by both 
simulation and experiment follow similar trends for different packet sizes and packet 
loss rate. The Wavelan transmission rate cannot be set by the Lucent driver which is a 
limitation of the driver. Above all the proposed method would facilitate in a quick set 
up of policies in the admission controller for a given number of active nodes. 
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Abstract. There is a growing demand for bandwidth as well as mobil- 
ity. Within ETSI BRAN a wireless LAN called HIPERLAN/2 has been 
standardized. While data rates can be as high as 54 Mbit/s for a high 
carrier to interferer ratio (C/I), more robust combinations of modula- 
tion and code-rate have to be used and also retransmissions do occur 
when interference is present. This leads to much smaller effective data 
rates. Interference and link adaptation are therefore important topics. 
In order to have a realistic co-channel interference, two radio cells are 
implemented which interact with each other. In both radio cells detailed 
implementations of the protocols are used. Data transmission between 
the terminals is carried out via TCP/IP respectively UDP. In this pa- 
per it is shown how throughput as well as delay can be improved in 
all load conditions by reducing co-channel interference and reducing the 
variations in the interference situation which significantly increases the 
effectiveness of link adaptation. 



1 Introduction 

There is a growing demand for bandwidth as well as mobility which led to sev- 
eral research projects which are investigating high speed wireless LANs. Within 
ETSI BRAN a wireless LAN called HIPERLAN/2 has been standardized which 
operates in the 5-6 GHz band. It can be used in combination with e.g. ATM or 
TCP/IP and as part of UMTS. 

Within HIPERLAN/2 the modulation and code-rate are adapted to the con- 
ditions of the radio link. With seven different combinations of modulation and 
code-rate (=Phy. Mode), data rates from 6 Mbit/s to 54 Mbit/s with different 
requirements on the C/I and different resilience against transmission errors are 
possible P. HIPERLAN/2 is a cellular system with frames of a fixed length of 

2 ms. A frame starts with a Broadcast Channel (BCH) followed by a downlink 
(DL)- and uplink (UL)- phase and the Random Channel (RCH). HIPERLAN/2 
uses Time Division Duplex (TDD) and Time Division Multiple Access (TDMA). 
In the centralized mode an Access Point (AP) serves as central controller. It de- 
cides every frame anew when a wireless terminal shall receive and when it shall 
transmit and which Phy. Modes shall be used for transmission. 
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2 Goal of this Work 

While data rates can be as high as 54 Mbit/s for high C/I values (>30dB), more 
robust Phy. Modes have to be used when interference is present. Also more re- 
transmissions do occur if interference becomes higher. This leads to much smaller 
effective data rates. Therefore, interference and link adaptation are important 
topics. In this paper several approaches are presented which can improve the 
interference situation and which can make link adaptation more effective. First, 
one approach is described which works under low and medium load conditions. 
Then it is shown how this approach can be used to improve the situation es- 
pecially for delay sensitive connections. Finally, an approach is presented which 
works under all load conditions. It improves system performance if the first men- 
tioned approaches are not used and increases system performance even more if 
the above mentioned approaches are used. 

2.1 Improving the Interference Situation and Effectiveness 

of Link Adaptation in Low and Medium Load Conditions 

In low and medium load conditions not all the frame is used for transmission. 
These silent periods can be exploited to reduce the interference in co-channel 
radio cells and to make link adaptation easier. Unfortunately, most traffic in 
LANs is data traffic which is transmitted via TCP/IP. Investigations have shown 
that this traffic is quite bursty in nature, especially considering WWW or FTP 
traffic |2|. The burstiness of the data leads to strong variations in the lengths 
of the silent periods. Thus, in low and medium load conditions, there are many 
frames which are completely filled and also many frames which are almost empty. 
The strong variations of the silent periods mean also strong variations in the 
interference situation for the co-channel radio cells which makes link adaptation 
difficult. For link adaptation it is fortunate if the interference situation that is 
measured now equals the interference in the future. Link adaptation works best 
for a slowly changing and predictable interference situation. 

In [3 it has been shown that the burstiness in the interference situation is 
reduced with an approach called Reduced Burstiness (RB). Even if big TCP 
segments arrive at the AP, the scheduler does not permit one terminal to use 
the whole frame even if the current system load would allow this. Instead every 
terminal is allowed to use a certain percentage of the frame at maximum. This 
percentage depends on the number of active terminals and their expected mean 
load. It is chosen in a way that no terminal can use the whole frame by its own 
but that all active terminals together can fill the whole frame. 

By limiting the percentage of a frame that can be used by one terminal the 
burstiness of transmission is reduced. This is shown in Fig. Ewhere the duration 
of the silent periods is shown for RB and the approach that does not limit the 
burstiness of transmission (Normal Burstiness (NB)). 

With RB, the duration of the silent periods is almost equally distributed 
between 0 ms and the maximum length of the silent periods (see Fig. [Ql. Since 
BCH and RCH are present in every frame, the duration of the silent period can 
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Duration of silent periods 




Fig. 1. CDF of duration of silent periods 



not become longer than 1.83 ms in this scenario. The standard deviation for the 
duration of the silent periods is reduced from 0.62 ms to only 0.38 ms. These 
results were obtained for a load that led to a mean frame usage of 60% which 
means that in average 60% of the frame were used. 

In |3j it has also been shown that with RB the retransmission load is signif- 
icantly reduced compared to NB. Thus, RB achieves a higher throughput for a 
given delay requirement or a smaller delay for a given throughput. 

Furthermore, it was shown that an intelligent placement of the silent periods 
is necessary to increase throughput and QoS. Several placements of the silent 
period were investigated (see Fig.|2|): 

— The silent period is divided by the number of scheduled terminals. An equal 
duration of the silent period is inserted before every scheduled PDU burst 
(Equal Silent Period Placement {ESP)). 

— The silent period is inserted as one piece before the BCH in both co-channel 
radio cells (Symmetric Silent Period Placement (SSP)). 

— The silent period is inserted as one piece before the BCH in one radio cell 
and after the BCH in the co-channel radio cell (Asymmetric Silent Period 
Placement {ASP)). 

Since the position of the silent periods can be chosen without any restrictions 
on a frame per frame basis, the above describe approach works for all offsets 
which can occur between the frames of two co-channel radio cells . 

2.2 Improving the Interference Situation 
for Delay Critical Connections 
in Low and Medium Load Conditions 

In this paper it will be shown that with RB and an intelligent placement of the 
silent periods it is possible to improve the situation especially for delay sensitive 
connections which must not suffer any or many retransmissions. This can be done 
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Fig. 2. Different placements of the silent periods 



by the above described approach and scheduling delay sensitive connections in 
parts of the frame where interference will be usually lower than in other parts of 
the frame. In order to reduce the interference in certain areas of up- and downlink 
in both radio cells, a different version of ASP was used. It is called ASP-Both 
Links Improved (ASP-BLI) and it’s structure is shown in Fig.0 The most delay 
critical connections can be transmitted in the areas where interference will be 
usually lower than in other areas. 
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Fig. 3. Placement of the silent periods with ASP-BLI 



Please notice that the length of the unused capacity, the length of the down- 
link and uplink phase and the length of the BCH varies from frame to frame. 
Nevertheless it will be shown that with this approach it is possible to signifi- 
cantly improve the situation for delay critical connections in both radio cells and 
for both up- and downlink. 

2.3 Further Improvement of the Interference Situation 

and Effectiveness of Link Adaptation for All Approaches 
in All Load Conditions 

The above described approaches work well under low and medium load condi- 
tions. For higher load values their effect becomes very small since they depend 
on the existence of unused capacity in the frame. Now another approach will 
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be presented which works under all load conditions. It can be used in combina- 
tion with the above described approaches or without them. If it is used together 
with the above described approaches it leads to a further improvement of system 
performance. 

In PI it was shown that the effectiveness of LA is increased if the interference 
situation becomes less varying. This is also tried to achieve with the following 
approach. It is called Maximum Similarity (MSI) and tries to achieve as much 
likeliness between consecutive frames in a radio cell as is possible. If consecutive 
frames in one radio are similar, then also the interference situation for neigh- 
bour radio cells is similar. This means also that the interference situation is less 
varying. To achieve this goal the following simple steps have to be performed. 
The scheduler stores the start and the end of its own transmissions in a frame. 
In the next frame, after the scheduling has been performed, an additional step is 
inserted. Within the DL phase and also within the UL phase the order of trans- 
mission of the scheduled PDU bursts can be chosen without any restrictions. 
The scheduler tries to find the order of transmission in the DL phase as well 
as in the UL phase that gives the most resemblance to its previous frame. In 
the current implementation it does this by brute force. It calculates all possible 
permutations of the transmission order of the scheduled bursts and calculates a 
measure for the resemblance between the frames. 

The transmission order is chosen which gives the highest resemblance to its 
previous frame. This ensures that the interference situation varies as little as 
possible. 

In Fig. ^an example for the unsorted transmission order and for the trans- 
mission order with MSI is given. They have been extracted out of one of the 
simulations performed. On behalf of a clear representation only the downlink 
phase is shown but the same behaviour applies for the uplink phase. 

Please notice that the scheduling is performed as given by any possible al- 
gorithm. Only after the scheduling has been performed, does the scheduler try 
to find the transmission order that gives the maximum similarity between the 
frames. 



3 Scenario and Simulation Environment 

In this paper the following scenario is considered. It consists of a big exhibition 
hall with 16 APs and a site to site distance of 62.5 m. There are 8 frequencies 
available which means that one frequency is used by two APs. Per radio cell, a 
number of active Mobile Terminals (MTs) are moving around with a speed of 
3 km/h. Each active MT sustains a bi-directional connection with the AP. In a 
TCP connection, 75% of the generated user load is in downlink direction and 
25% is in uplink direction. In a UDP connection, 50% of the generated user load 
is in downlink direction and in uplink direction each. 

The attenuation of signals is calculated via the following one slope model for 
LOS propagation in indoor large open spaces: 
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Fig. 4. Comparison between unsorted transmission order and transmission order with 
MSI 



Ld[dB] = 46.7 + 24 • log{distance/lm) (1) 

Adjacent channel suppression is assumed to be so high that adjacent channel 
interference can be neglected. Up till now no power control is used and all termi- 
nals send with equal transmission power. Applying equation Hand the distance 
to the interferer, the C/I value is calculated. Furthermore, log-normal fading 
with a standard deviation of 7 dB is added in order to model shadowing caused 
by e.g. people moving around. A model which was proposed in ^ is applied. It 
uses the following correlation function with a decorrelation length dcorr of 3.5 m 
0 : 

R{Ax) = (2) 

According to files generated out of link level simulations the calculated 
C/I corresponds to a PER which is then applied to this PDU. 

In the current state of the simulation perfect measurement of the C/I values 
is assumed for the link adaptation. The AP and the MTs store the last N C /I 
values of every connection. These values constitute the basis for the decision 
which Phy. Mode is used. 

4 Modelling of Co-channel Interference 

In order to have a realistic co-channel interference, two radio cells are imple- 
mented which interact with each other. In both radio cells detailed implemen- 
tations of the protocols for AP and MTs are used. The sources generate data 
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which is transmitted via TCP/IP respectively UDP between the terminals. In 
the convergence layer the TCP or UDP segments are segmented to fit into User- 
PDUs (U-PDUs). The U-PDUs are then transmitted via the wireless link. An 
Selective Repeat ARQ scheme with bitmap acknowledgements is implemented in 
detail as described in Q with a limited ARQ window size (128 in the presented 
simulations). The collision resolution for the RCH is implemented in detail as 
described in |7]. The scheduling of acknowledgements and data is performed in 
every frame on PDU basis. 

No simplifications are made with respect to the described protocols. 

5 Results 

With the scenario described above, simulations are carried out to evaluate the 
approaches described above. 

In order to show how much the delay for delay critical connections can be 
improved with RB and ASP-BLI and placing the delay critical connections in 
an area of the frame where interference will usually be lower than in other areas 
of the frame the following scenario was used. It consists of the above described 
scenario with a mix of UDP and TCP connections. There are 10 active bi- 
directional connections per radio cell. 12% of the load are generated by the UDP 
sources and 88% by the TCP sources. In average 55% of the frame are used. 

In Fig. Eland Fig. Elthe delay for the UDP connections is shown for RB with 
ASP-BLI and ESP. Although the length of the unused capacity, the length of 
the downlink and uplink phase and the length of the BCH varies from frame 
to frame it can be seen that ASP-BLI significantly improves the situation for 
delay critical connections. Down- and uplink delay are significantly better for 
ASP-BLI than for ESP. The improvement of the delay performance is stronger 
in the uplink direction than in downlink direction. If PDUs get lost in uplink 
direction, the MT has to inform the AP first about additional capacity requests 
before they can be transmitted. This leads to a further increase of the delay if 
PDUs get lost. Thus it is very important to reduce the number of retransmissions 
especially in uplink direction if delay critical connections are concerned. Results 
are only shown for one radio cell but they are very much the same for the other 
radio cell. 

The following scenario is used to show the influence of MSI on system perfor- 
mance. It consists of 5 active bi-directional TCP connections per radio cell. The 
generated user load refers to pure user data without any overhead. Due to TCP 
timeouts, segment retransmissions do occur and due to duplicate TCP acknowl- 
edgements, fast segment retransmissions do occur, which are not counted as 
generated load. Also, no TCP/IP, convergence layer or HIPERLAN/2 overhead 
was included in the generated user load. 

In Fig.Elit can be seen that MSI significantly reduces the retransmission load 
for ASP. In order not to overload Fig. Elthe curves for SSP-MSI and ESP-MSI are 
not shown. The relative improvement of MSI for SSP is similar to that of ASP. 
For ESP the improvement with MSI is very small and hardly visible. This is due 
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Downlink Delay of UDP Segments 




Fig. 5. CDF of downlink delay of UDP segments for RB with ASP-BLI and ESP 



Uplink Delay of UDP Segments 




Fig. 6. CDF of uplink delay of UDP segments for RB with ASP-BLI and ESP 

to the fact that ESP leads to a high variance in the interference situation. Taking 
a look at the C/I cumulative distribution function (see Fig.0), it can be seen that 
ASP and ESP have almost identical C/I distributions (no difference is visible 
in the diagram), but the difference in the retransmission load between them is 
significant. This is due to the fact that with ASP the interference situation is less 
variable than with ESP and link adaptation can work much more effective. This 
can be explained by the following. The C/I is counted on a PDU basis. For every 
transmitted PDU the C/I value is measured in the simulations and counted in 
the C/I distribution function. With ESP, only short silent periods are inserted 
before the scheduled bursts. For the co-channel radio cell this means that in a 
PDU burst, there may be some PDUs that overlap with the silent period in the 
co-channel radio cell and there are other PDUs of the same PDU burst that 
overlap with transmissions in the co-channel radio cell. This means that in one 
PDU burst belonging to one connection there are PDUs which have a high C/I 
value and others which have a lower C/I value. This is a difficult situation for 
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the link adaptation to cope with since there is no optimal Phy. Mode for this 
situation. If a too high Phy. Mode is used, many retransmissions do occur. If 
a too robust Phy. Mode is used, transmission capacity is wasted. With ASP 
there exists only one long silent period. In the co-channel radio cell will be some 
PDU bursts which overlap completely with the silent period and others which 
do not overlap with the silent period at all. Then the PDUs of a PDU burst have 
either a high or a low C/I value. While the number of PDUs that overlap with 
the silent period in the co-channel radio cell is almost the same for ASP and 
ESP (leading to an almost identical C/I distribution), link adaptation can work 
much more effectively with the situation produced by ASP. This behaviour of 
ESP makes it impossible for MSI to gain much improvement since the variance 
in the interference situation produced by ESP is too dominating. 




Fig. 7 . CDF of C/I for RB with ASP, SSP and ESP 




Generated User Load [Mbit/s] 



Fig. 8. Retransmission Load for RB and several placements of the silent periods (with 
and without MSI) 
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6 Summary and Conclusion 

In a HIPERLAN/2 system with co-channel interference, interference and link 
adaptation are important topics. In order to have a realistic co-channel interfer- 
ence, two radio cells are implemented which interact with each other. In both 
radio cells detailed implementations of the protocols for AP and MTs are used. 
Data transmission between the terminals is carried out via TCP/IP and UDP. 

It has been shown that co-channel interference can be reduced and link adap- 
tation can become more effective in low and medium load conditions. This is 
done by limiting the burstiness of transmissions and an asymmetric placement 
of the silent periods in two co-channel radio cells. With a modified asymmetric 
placement of the silent periods it is possible to improve the interference situation 
in certain areas of the frame for both up- and downlink and in both co-channel 
radio cells. This improvement can be exploited to significantly reduce the delay 
of delay critical connections which can be scheduled in areas of the frame where 
interference will be usually lower than in other areas of the frame. These ap- 
proaches work well for low and medium load conditions but their effect becomes 
very small in higher load conditions since they depend on the existence of unused 
capacity in the frame. Another approach was presented which works under all 
load conditions. With an easy algorithm and independent of any other condi- 
tions it optimizes the similarity between consecutive frames in one radio cell. If 
consecutive frames in one radio cell are similar so is the interference situation 
for co-channel radio cells. It has been shown that it is possible to significantly 
reduce the retransmission load under all load conditions if this approach is ap- 
plied. In this paper an ideal measurement of the C/I values is assumed for the 
link adaptation. In the future the effect of the described approaches will be in- 
vestigated for a realistic link adaptation and actual available measures for the 
radio link quality. 
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Abstract. A future ITS info-communication systems using DSRC is discussed. 
As the first step of an inter-media multimode terminal in the future, an intra- 
media multimode wireless communication terminal using software defined 
modem technology is developed in order to adopt several DSRC service 
networks. The experimental software defined modem can work at a high data 
rate of 4 Mbps in the 7t/4DQPSK. It enabled multi-modulation-system 
f7t/4DQPSK and OMSK), multi-frame-format, and multi-bit-rate. 



1 Introduction 

Recently, Intelligent Transport Systems (ITS) has become the focus of global 
attention. Various info-communication services in the ITS must take an important role 
in safe and comfortable driving, and therefore they are being researched and 
developed all over the world [1]. The various info-communication services are 
provided with cellular telephone systems, Dedicated Short-Range Communications 
(DSRC), and digital broadcasting, as shown in Figure 1. Seamless-connection- 
communication architecture among these networks for ITS info-communication 
services was reported [2]. 

However, new services require new on-board equipment. As a result, cars will be 
possibly filled up with many communication terminals. Furthermore, it is necessary 
that the content provided by ITS service is revised or expanded with a gradual 
improvement on infrastructure of ITS info-communication systems. Similarly, as 
wireless communication technology related to the vehicular terminal equipment is 
progressively improved, the communication system in the vehicular terminal 
equipment needs upgrading from an existing system to an advanced one. Therefore, a 
multimode wireless communication terminal [3] [4] using software radio technology 
[5] [6] [7] has been proposed and developed. 

DSRC service networks, such as electronic toll collection (ETC), local traffic 
information services, emergency dispatch services, parking payment system and so on, 
are peculiar to ITS. Therefore, in this paper, as the first step of an inter-media 
multimode terminal in the future, which can handle a cellular telephone system, 
several DSRC services and a digital broadcasting, an intra-media multimode wireless 
communication terminal for DSRC service networks is examined. 
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State-of-the-art DSRC service systems for ITS in Japan are briefly described in 
Section 2. Furthermore, specifications and experimental results of an intra-media 
multimode wireless communication terminal for DSRC service networks are shown in 
Section 3. 




Parking payment 
Limited local traffic information 
Inter- vehicle radar (76GHz) 



Fig. 1. ITS Info-communication systems in the future 



2 DSRC Service Systems 

DSRC system is a mobile communication system among ITS info-communications 
system. The DSRC architecture has been developed according to the ISO-OSI layer 
model. But due to the real-time constraints a three-layer approach, that is the physical 
layer, the data link layer, and the application layer, has been chosen. In Japan, a 
DSRC standard has completed, and it has introduced for the ETC system [8] [9]. An 
outline of the physical layer in the DSRC standard as ARIB (Association of Radio 
Industries and Businesses) STD T-55 is listed in Table 1. An On-Board Terminal 
(OBT) uses an active transponder. Through providing different frequencies for up- 
link and down-link, frequency division duplex is available. In five years, the ETC 
system will be introduced at about 730 tailgates out of the total 1300 tailgates of 
Japanese national expressways. 

Another DSRC application like road-to-vehicle communications in smart cruise 
systems [10] has employed an experimental radio specification, in which 7t/4DQPSK 
modulation system, a data rate of 512 kbps, a information update cycle of 100 msec 
and an available communication distance of 100 m were used, although 5.8 GHz band 
and multiple access of TDMA/EDD were used. 

Vies (Vehicle Information and Communication System)[l 1][12] as a traffic 
information service was operated through radio-wave beacons, infrared light beacons. 
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and FM multiplex broadcasts. The radio-wave beacons system using 2.5GHz band 
with a data rate of 64 kbps is a simplex application of DSRC. 



Table 1. ARIB STD-T55 for ETC 



Item 


Specifications 


Carrier frequency 


5.8GHz band 


Transmission power 


Roadside station: max 300mW 
Mobile station: max lOmW 


Modulation system 


ASK(Amplitude Shift Keying) 


Data rate 


1024 kbps 


Available communication distance 


Max. 30 meters 


Multiple access 


TDMA-EDD system 


Access control 


Slotted ALOHA protocol 



Those are the first generation of DSRC systems. In the ETC system, for example, a 
TDMA system having 8 (maximum) time slots is adopted for some users in a spot 
zone, therefore, a data transfer rate per user may become actually around 100 kbps, 
and communication time is also short because of a spot zone. 

To encourage the continued development of DSRC services, techniques for higher 
data-rate should be studied in the next generation. As a result, the next generation 
DSRC can accomplish high speed and broadcast communication services. Moreover, 
the DSRC system will have following features. 

1 . Effective use of frequencies by using small-zone configurations 

2. Large-volume, high-speed information transmissions to moving vehicles 

3. Electronic payment through radio communications 

4. Services provided by Internet connections 

Taking advantage of these features, the widespread development of ITS services 
based upon DSRC is expected. In North America, they are working on the next 
generation of DSRC standards at 5.9 GHz band [13]. In EC, a project to combine the 
extensive functionality of an intelligent in-vehicle terminal with the use of a two-way 
communication link with the roadside is being developed [14]. 

Many base stations along roadsides will be connected with DSRC networks, which 
is one of the most important study-issue in DSRC systems. Eor example, DSRC 
network architecture for ITS multicast services [15] was reported. 
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3 Experimental System of Intra-media Multimode Wireless 
Communication for DSRC services 

The final target of ITS multimode wireless communication terminals is to handle 
three different kinds of ITS info-communication media, which are cellular telephone 
systems, DSRC, and digital broadcasting. It will be expressed as an inter-media 
multimode terminal. However, multiple applications of DSRC will be realized in the 
near future. An intra-media multimode wireless communication terminal for DSRC 
service networks should, therefore, be examined as the first step of the final target. 

In order to make a feasible check of high-speed information transmissions to 
moving vehicles, a data rate of 4 Mbps using 7t/4DQPSK and a duplex system were 
employed in the experimental system of intra-media multimode wireless 
communication for DSRC services. Furthermore, information transmission using a 
broadcast type is cost-effective, and it will be used even in the future. Therefore, both 
the roadside base station A for the high-speed duplex system and the roadside base 
station B for simplex system were experimentally developed. Both base stations 
employed different modulation systems and different carrier frequencies. Those 
experimental systems provided a condition to study multimode and multi-frequency 
terminals. Figure 2 shows the block diagram of the experimental system containing an 
experimental multimode wireless terminal as a mobile unit. 




Fiber Experimental roadside Multimode Wireless Terminal 

LAN base station B (Mobile unit j 



Fig. 2. An experimental DSRC system for the multimode wireless terminal 



4 An Experimental Multi- mode Terminal 

Basic specifications of an experimental intra-multimode terminal using software 
defined radio technology in Fig. 2 are listed in Table 2. 
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Table 2. Basic specifications of an experimental intra-media 
multimode terminal 



Items 


Specifications | 


Link 


Up and down 


Down only 


Modem 


Modulation systems 


7t/4 QPSK 


GMSK 


Data rate 


4000kbps 


64kbps 


Signal format 


continuous or time division 


Output signals 


I and Q signal 


RF 


Carrier frequency 


5.8GHz 


2.5GHz 



A multi-frequency microwave antenna has been developed. The frequencies of both 
5.8 GHz and around 2.5 GHz are available. To deal with the multi-frequency signal 
provided by the antenna, two receiver front-end circuits suitable for multiple- 
frequency-resonance antenna and common receiver circuit for SNR maximization 
were employed in the multi-band RF section. The section consists of wide-hand 
amplifiers, multi-band frequency converters, automatic gain controller, and IQ signal 
interfaces. 

Architecture of the OBT for DSRC is constructed with layer 1, 2, and 7, according 
to the ISO-OSI layer model. In order to realize adaptability in communication systems 
of DSRC, signal processing related to the communication systems in the layer 1 is 
necessary to be carried out through software programs. As shown in Table 2, the 
experimental modem enables multi-modulation-system, multi-frame-format, and 
multi-symbol-rate. Furthermore, using a download controller in Fig. 1, even a new 
ITS service in the future will be promisingly available through downloading the 
modem software of the new ITS service. Thus, the software definable modem system 
is essential for a seamless-connection-communication architecture [2] in ITS services. 

The digital modem is configured mainly with digital signal processors (DSP) and 
field programmable gate arrays (FPGA). Evolutional development of such 
semiconductor device will bring an inter-media multimode terminal as the final target. 



5 Conclusions 

A future ITS info-communication systems using DSRC was discussed. As the first 
step of an inter-media multimode terminal in the future, an intra-media multimode 
wireless communication terminal using software defined modem technology was 
studied in order to adopt several DSRC service networks. The experimental terminal 
with a high data rate of 4 Mbps were presented. It enabled multi-modulation-system 
(7t/4DQPSK and GMSK), multi-frame-format, and multi-bit-rate. 
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Abstract. A practical function of traffic engineering in IP-based networks is the 
mapping of traffic onto the network infrastructure to achieve specific 
performance objectives in respect of traffic-oriented and/or resource-oriented. 
The simulation of such traffic engineering function is necessary needed while 
planning and optimizing the network resources. This paper describes an new 
model for analytical simulation of the traffic mapping in IP-based networks in 
respect of both real-time and best-effort traffic. We mainly conceptualize the 
simulation as an behavioral modeling of network dynamics. We then 
mathematical formulate this behavioral modeling as two sub-optimization tasks 
corresponding to both real-time and the best-effort traffic delivering. As an 
example of our algorithm implementations, an genetic algorithm for real-time 
traffic mapping is presented. The article also gives some numerical results from 
practical examples. 



1 Introduction 

A practical function of traffic engineering in IP-based networks is the mapping of 
traffic onto the network infrastructure to achieve specific performance objectives in 
respect of traffic-oriented and/or resource-oriented [1]. Traffic-oriented performance 
objectives relate to improvement of QoS provisioned to internet traffic. Resource- 
oriented performance objectives relate to the optimization of the network utilization. 
The simulation of such traffic mapping is necessary needed while planning and 
optimizing the network resources. Most of the present network simulation models 
focus in the first line on the analytical simulation of traffic engineering in traditional 
data networks in which the traffic is only considered as one class of "best-effort" 
service and it is defined by means of a demand metric describing the traffic 
requirements between all pairs of switch-to-switch or router-to-router. In such 
approaches, the traffic is modeled only as a bundle of data flows regardless of 
priorities and QoS requirements [2, 3, 4]. 

In comparison with data-oriented applications, the real-time applications own 
special characteristics, which are still not considered in the present approaches. In 
particular, these applications are typical less elastic and less tolerant of delay variation 
than data applications. Such applications require the guarantees of capacity 
constraints (e.g. peak rates, mean rates, burst sizes) and QoS constraints (e.g. packet 
loss, delay) from the Internet site. Thus, these constraints of the real-time traffic 
should he considered in the traffic engineering simulations. 
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This paper describes an model for analytical simulation of both real-time and best- 
effort traffic mapping in IP-based networks. The remainder of this paper is organized 
as follows: In section 2, the model for simulating the traffic mapping is presented. In 
section 3, an genetics algorithm for real-time traffic mapping is described as an 
example for our algorithm design. The result of some practical examples is shown in 
section 4. Finally, the conclusion and future works will be given in section . 



2 The Simulation Model 

The simulation of traffic mapping is formulated as follows: 

Given are 

• a fixed IP-based backbone infrastructure, the capacities and buffering of its 
components, the node locations and the link topology, K classes of real-time 
traffic by means of traffic parameters and QoS requirements, 1 class of "best- 
effort" traffic by means of given demand metric and packet delay requirements. 

To determine 

• an cost-effective utilization of the real-time and best-effort traffic on the given 
backbone. 




Fig. 1. An overview of the hybrid simulation model for the traffic engineering 
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This simulation problem is conceptualized, on the one hand, as an structural 
modeling, and, on the other hand, as an behavioral modeling of the IP network 
(figure 1). The structural modeling focuses on the abstract representation of IP 
backbone as a graph G = (V, E) in which V is a set of nodes and E is a set of edges. A 
node represents a network router and a edge represents a physical link connecting one 
router to an another one. The behavioral modeling deals with the simulation of the 
traffic mapping in respect of network dynamics, such as, the traffic characteristics, the 
RSVP, the MPLS, the constraint-based routing, the OSPF and the resource allocation 
[5, 6, 7, 8, 9]. In this paper, we concern with the simulation of traffic mapping at the IP 
layer relating to Integrated Services architecture. Whereby, the goal is to find an 
optimal network utilization for a given traffic and QoS requirements so that the cost 
of the bandwidth needed will be minimal. 

For this traffic mapping, we use the following simple scheduling discipline, 
namely, the full priority is given to the real-time traffic, and the channel capacities 
and node capacities that are not used by the real-time traffic are made available to the 
"best-effort" traffic. Based on this assumption, the traffic mapping consists of two 
steps. First, the real-time traffic is tried to be mapped onto the given fixed IP 
backbone. After all real-time traffic demand is mapped, the "best-effort" traffic is then 
mapped to this backbone under consideration of his spare capacity. 



2.1 Mapping the Real-Time Traffic 

The mapping of real-time traffic is an analytical simulation of its transmitting over 
real IP-based networks. An such transmitting process can be seen as an sequence of 
network mechanisms carried on heterogeneous set of Integrated Services routers, 
running different routing protocols and using different forwarding algorithms. These 
mechanisms deals with the Integrated Services Architecture [5,7] and relate to path 
selection, traffic characteristics, resource allocation, and, traffic assignment to the 
established path which may have been selected by routing protocol or by some other 
means. 

The path selection is done via Integrated Services routers which perform 
constraint-based routing, admission control and resource allocation based on the 
information contained in the source traffic specification and in the desired service 
specification which is generally named as traffic characteristics. In this paper, the 
traffic characteristics are used as an input for our traffic mapping algorithms. 

Constraint-based routing refers to a class of routing systems that compute routes 
through a network subject to satisfaction of a set of constraints and requirements 
[4,5]. Constraints may include bandwidth, hop count, delay and resource class 
attributes. The other importance mechanisms simulated in our work are the RSVP 
and the resource allocation. With RSVP, the application source (sender) transmits a 
path message along the routed path to the unicast or multicast destination (the 
receiver) [3]. The purpose of the Path message is twofold: to mark the routed path 
between the sender and the receiver and to collect information about the QoS viability 
of each router along the path. The resource allocation is an mechanism to determine 
efficient bandwidths to be reserved on the routers and on the connections along a 
path computed by constraint-based routing. 




746 T.T.M. Hoang and W. Zom 



The main principle of the real-time traffic mapping is described in figure 2. 



Modeling and characterizing the real-time traffic 

repeat! 

Selecting the path for projecting the real-time flow i; 
//Allocating the resource on selected path for flow i; 
determining the resource to be reserved; 
calculating the spare capacity for nodes and links; 

} until(all real-time flows are mapped) 



Fig. 2. Principle of real-time traffic mapping 



The traffic mapping includes three steps: characterizing the real-time traffic, 
selecting the path to project the flows, and allocating the resource on the selected 
path. To characterize the real-time traffic at IP layer, we use the traffic specification 
Tspec and the service specification Rspec to model the real-time flows. The traffic 
specification is mainly based on a simple token bucket (LB), a peak rate p, a minimal 
policy m, a maximal packet size M. The token bucket has a bucket depth, b, and a 
bucket rate, r. The service specification defines the rate R to be reserved for a given 
real-time flow. To schedule packets for outbound transmission from routers, we use 
the assumption that the Integrated Services routers in our modeled network 
infrastructures support WFQ which is used today for guarantee QoS in IP-based 
networks. 
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Fig. 3. Selecting the path from s to d for mapping the traffic characterized by Tspec 

The goal of path selection for a given pair of source and destination node is to find 
a cost-effective path used for projecting the real-time traffic demands between these 
nodes on the given fixed IP backbone. Our path selection algorithm is an hybrid 
simulation of the RSVP and the constraint-based routing mechanism. The algorithm 
starts at the receiver and searches the next neighbor of the actual router towards the 
sender (Fig. 3). An subtask developed within this algorithm is an simulation of the 
admission control which determines whether the router has sufficient available 
resources to apply the QoS request. The difference to the conventional Depth First 
Search and OSPF BGP is that our algorithm considers the Tspec, the Rspec, the spare 
capacity on the routers and on the links as well as the hop number while selecting the 
path. Furthermore, in our algorithm, the IP routing for real-time flows are taken into 
consideration. This algorithm is developed as genetic algorithms. 








Simulation of Traffic Engineering in IP-Based Networks 747 



The resource allocation on the selected path O is formulated as follows. Given are 
the traffic specification Tspec, a selected path O consisting of p nodes and the 
parameters described above. To find the resource R={Ri, R2, ...Rj, ..., Rp} to be 
allocated at nodes i along O under consideration of the spare capacities Si at nodes i 
belong O, and the upper bound dg of end-to-end delay. This problem is then 
modeled as an local optimization problem as follows: 



To optimize 



Subject to 



/gO 



S^-R: 



■+F„ 



Ri ^ and < do 



( 1 ) 



(2) 



Whereby, dj.gq^j is the desired queue delay bound for the path O, Fq is the 
constants calculated using the traffic parameters and the spare capacities at the nodes. 
To find are the resource Rj to be allocated at nodes ie O so that the cost function F 
will be minimal. This optimization task is a non linear combinatorial optimization 
problem that is solved using the lagrangean procedure and the convexity 
characteristic of the cost function. The rate to be allocated at the nodes i belonging to 
O can be computed at follows: Rj = max{r, Rj}. 

After the reserved rate R is determined, the spare capacity of each node and of 
each link belong to path O are recalculated. Based in this backbone spare capacities, 
the best-effort traffic mapping is done using the flow deviation method described in 
the next section. 



2.2 Mapping the Best-Effort Traffic 

The "best-effort" traffic mapping is an analytical simulation of the best-effort traffic 
transmission over packet networks. This transmission can be seen as the 
approximation of the OSPF Border Gateway Protocol (BGP). Our "best-effort" traffic 
mapping is based on the assumption that the packet arrival process is a poisson 
process with exponential distribution and the links are modeled as independence 
M/M/l/K FCFS queue. The traffic mapping is formulated as the following 
optimization task: 

To optimize 



T = 




fi 

Ci - fi 



( 3 ) 
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Subject to 

f. 

r < To and ^ < 1 
Ci 

Whereby 

^ ''p 

i ieE,iep,peP 



(4) 



(5) 



Where, T is the total network delay based on Kleinrock assumption and Little's 
form[4]. fj is the flow on link i . P is the set of paths between any pair of demand 

nodes, p is a path in P. Yp is the traffic demand on the path p. y is the sum of all 
traffic demand entering the network. 

We solve this optimization task using the well known method named "flow 
deviation" (FD) [2]. The FD method is based on the observation that if a small 
amount of flow is moved from a path with larger incremental delay to a path with 
smaller incremental delay, then the total delay is decreased. The incremental delay on 
a path is just the sum of the incremental delay of all links in the path. The FD method 
proceeds by assigning lengths to the links, based on their incremental delay, then 
finding the shortest path from each source to each destination, based on these length, 
and mapping the traffic demand to the links based on these paths. Thus, the flow 
pattern found is then superposed with the previously found flow patterns. That means, 
a new flow pattern is formed by sending path of the flow on the old paths and part of 
the flow on new path. The amount of the flow deviated to the new path is chosen to 
minimize the total delay. The FD is implemented in our model as an genetic 
algorithm. 

The result of the traffic mapping process is the network throughput which can be 
used to obtain predictions and formulate control strategies under various conditions as 
well as to guide network upgrading plans. 



3 Genetic Algorithms for Real-Time Traffic Mapping 

Genetic algorithm (GA) is optimization technique based on the mechanisms of 
genetic adaptation in biological systems [10]. The algorithm maintains a population of 
all possible solutions to the given problem. In one of the most commonly used 
representations, a solution in the search space is a string represented by a sequence of 
characters or of numbers. This solution string is called the chromosome. The quality 
of an chromosome is judged by its fitness values that indicate which chromosomes 
have a better potential of being carried to the next generation. 

A genetic algorithm starts with an initial population of solutions and simulates the 
process of evolution. After a number of generations, highly fit chromosomes will 
emerge corresponding to good solutions to the given problems. 

Four main steps in development of an genetic algorithm are the Chromosomal 
coding-schema. Reproduction/selection, Crossover and the computation. The first step 
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is the process of encoding the chromosomes. The second step concerns with the 
selecting of potentially good strings from the current generation to be carried to the 
next generation. The crossover is the process of shuffling two randomly selected 
strings to generate new offstrings. The last step, the Computation, relates to the 
calculating of the fitness value using objective function. 

The population size is finite in each generation of GA, which implies that only 
relatively fit chromosomes in generation i are carried to the next generation (i-tl). The 
process of selection, crossover, and mutation is repeated till the termination condition 
is satisfied. 

Our traffic engineering simulation presented in the last sections is based on two 
algorithms running tandem, one for real-time traffic mapping and one for best-effort 
traffic mapping. In this section, the genetic algorithm GAl for the real-time traffic 
mapping will be outlined in detail. 

For the implementation of GAl, networks are modeled in the form of 
chromosomes. This chromosomes is a list of real values describing spare capacities of 
routers and links, and the fitness values. One of the fitness values is the hop number 
and the other are the minimum of the spare capacities of nodes and the minimum of 
the space capacity of the link. 

Chromosomal coding-schema. In GAl, we use two types of chromosomes, 
namely, the Network Chromosome (NC) and the Path Chromosome(PC). A NC 
Chromosome describes the actual capacity reserve of the whole IP-based network. 
The first n fields of the NC chromosome describe the spare capacities of the 
corresponding nodes. The next m fields of the NC chromosome describe the space 
capacities of the corresponding links. The fitness value is described in the last three 
fields of this NC chromosome. These fields are the node number, the minimum of the 
node spare capacities, and, the minimum of the link spare capacities. At each time of 
the computation process, a new NC chromosome can be computed via 
crossover/mutation between the old NC chromosome with the best PC chromosome. 
In this case, the new NC chromosome is then replaced to the old NC. A PC 
Chromosome describes the capacity reserves belonging to a possible path from a 
given source node s to a given destination node d. This path can be used for delivery 
the given real-time traffic demand from s to d. In comparison with a NC chromosome, 
in a PC chromosome, only the fields describing the nodes and links belonging to the 
given path are set to equal to its spare capacities. The other fields are set to equal to 
null. 

Initialization. At the begin of the GAl, the initial NC chromosome describes the 
input resource of the IP-based network. The set of PC chromosomes is set initial as 
null. For each real-time traffic demand from node s to node d, the GAl first determine 
a initial set of PC chromosomes. Each of these chromosomes represents the available 
resource on a possible path from s to d. 

Computing the fitness value. For each PC chromosome found above, the fitness 
is calculated and set into the last three fields of this chromosome. 

Reproduction and Selection. In a initial set of PC chromosomes and for a given 
traffic demand, the chromosomes having the minimal node capacity or minimal link 
capacity less than sum of the peak rates of real-time flows belonging to this traffic 
demand are first removed from the set. Only the PC chromosomes having minimal 
fitness hop number are keeped to the set. After that, the chromosome having minimal 
node capacity is selected to the next generation. We name this chromosome as 
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PC_best which means that the chromosome for the traffic demand from node s to 
node d is choose. For PC_best, the rate R to be reserved on the selected path is 
determined. 

Crossover. For each iteration in GAl, two crossover actions are done. The first 
crossover is carried on the chromosome PC(s,d) in which the non zero fields of 
PC(s,d) are overwrote by the value of the rate R. The second crossover is done via 
subtracting the NC chromosome to the PC(s,d) chromosome. 

Termination condition. GAl is terminated if all traffic demands are removed 
from the real-time traffic demand set T^. Namely the set T^ is null. This means that all 
real-time traffic demands are mapped onto IP-based backbone. The routing for IP 
flows are considered during the genetic operations. An overview of the algorithm 
GAl is shown in figure 4. 



begin GAl 
NC Initialization; 

T^ is set of real-time traffic demands; 
while (Tj != null) do 

begin 

select tjj G T^; //T_.={ source s, destination d, characteristics of real-time 
flows} 

PC_set ^ null; 

begin 

PC_set ^Finding a initial set of PC Chromosomes belonging to t^^; 
Computing the fitness value of all chromosomes in PC set; 
//reproduction, selection: 

PC_best <r- Selecting the best PC chromosomes to be in PC_set; 
Calculating the rate R to be reserved from PC_best; 
for all v e PC_best: IF vg zero THEN v =Rj; //Crossover 1 
NC ^ NC - PC_best; //Crossover 2 

end; 

end; 

end GAl 

I 



Fig. 4. Genetic algorithm for real-time traffic mapping 



4 Numerical Experiments 

Our model is successful applied for simulating the traffic engineering in IP-based 
network infrastructures. Several IP backbones with different sets of routers, links, 
traffic demands and traffic characteristics are used as the tested IP-based networks. 







Simulation of Traffic Engineering in IP-Based Networks 751 



Table 1. Characteristics of selected real-time flows 



Traffic characteristic 


64 Kb/s voice 


Video Conference 


Stored video 


bucket depth b [kb] 


0.1 


10 


100 


bucket rate r [Mbps] 


0.064 


0.5 


3 


peak rate p [Mbps] 


0.064 


10 


10 


max. packet size M [kb] 


0.1 


1.5 


1.5 


End-to-end delay bound 


50 


100 


100 



Table 2. The tested IP-based network structures 



Network name 


Node 

number 


Link 

number 


Number of real- 
time traffic 


Number of best- 
effort traffic 


Network 1 


25 


80 


460 


600 


Network 2 


49 


84 


552 


2352 


Network 3 


100 


180 


920 


9900 


Network 4 


121 


220 


1012 


14520 


Network 5 


144 


264 


1104 


20529 


Network 6 


169 


312 


1196 


28392 




In our experiments, we consider one class of best-effort traffic and three classes of 
real-time traffic as the input traffic sources. The best-effort traffic is characterized via 
an traffic metric. Each element of this metrics is a tuple describing the IP traffic 
demands in packet number and in volume unit (Mbps) to approximate the traffic from 
one router to one other router. The real-time traffic is shown in table 1 . The input data 
for the tested IP backbone infrastructure is described in table 2. 
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The simulation model is implemented in Java Jbuilder under Window NT. The 
computation was carried out on an Intel Pentium Processor 333 and finished within a 
few minutes. Figure 5 shows the runtime of the simulation algorithm. The runtime 
depends on the complexity of the IP backbone infrastructure (table 2), the number of 
real-time and best-effort traffic demands, and the number of genetic strings per 
generation as well as total number of evolution. 



Conclusions and Outlook 

The main contribution of this paper includes the following. We described the 
analytical simulation of traffic engineering applying to the mapping of both "best- 
effort" and real-time traffic onto an IP-based network . We conceptualized this traffic 
mapping on the one hand as an abstract representation of the network infrastructure, 
and, on the other hand, as an behavioral modeling of the network dynamics relating to 
IntServ, RSVP, CR, OSPF BGP and resource allocation. We modeled this dynamical 
network behavior with two sub-optimization tasks which are solved using genetic 
algorithms in Java Jbuilder under Window NT. Computational tests with different IP- 
based backbone topologies ware carried out with insightful result. Our work can be 
used to obtain predictions and formulate control strategies under various conditions as 
well as to guide network upgrading plans. Our future work deals with the extension of 
our model for traffic mapping relating to Multi Protocol Label Switching (MPLS). 
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Abstract. In this paper, we present an algorithm for available bandwidth 
measurement of a path between two hosts as well as some preliminary 
simulation results. The measurement algorithm is based on active probing with 
two techniques we have developed: variable speed probing and zoom-in/zoom- 
out. Compared with previous work, the algorithm has the advantage of low 
overhead and fast convergence because it relies on the detection of traffic trends 
(with variable speed probing) rather than any specific properties of probing 
samples. The measurement can self-adapt to any bandwidth ranges (with zoom- 
out) and respond to accuracy requirements (with zoom-in). Therefore, no 
knowledge about the bottleneck bandwidth of the measured path is required. 
We are currently experimenting with self-similar traffic over a real network 
environment to gain more experience and to further validate and improve the 
measurement techniques. 



1 Introduction 

The available bandwidth of a path between two network points is one of the important 
dynamic network characteristics used for optimizing resource utilization in traffic 
engineering and for admission control in quality of service. Since available bandwidth 
is a dynamic parameter, it must be determined based on both the capacity of the links 
and the current traffic that the links carry. Consequently, an effective measurement of 
available bandwidth can also serve as the means of determining the current traffic. 
Due to the dynamic nature of the traffic, available bandwidth has to be measured 
more frequently depending on the fluctuation situation of the traffic. That is, the 
interval between successive measurements should be a parameter to maintain the 
timeliness and usefulness of the measurement results. This requires that the 
measurement overhead be kept as low as possible while the desired measurement 
accuracy can be achieved. 
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In this paper, we present a new algorithm and the techniques for available 
bandwidth measurement that we have proposed and implemented for evaluation. We 
also present some preliminary simulation results to validate the measurement 
techniques. In the algorithm, we do not assume any knowledge about the capacity of 
the network or links, nor do we assume any knowledge about the traffic. Rather, we 
explore some distinctive properties of traffic behavior through active probing and 
detect the changes in the derivation of the measurement results. This algorithm is also 
very general in the sense that it responds to accuracy requirements and can 
automatically adapt itself to measure links of any bandwidth. 

This paper is organized as follows. In the next section, we briefly review some 
previous work related to available bandwidth measurement. In Section 3, we describe 
the algorithm and the techniques. In Section 4, we present and discuss some 
preliminary simulation results. Finally, we conclude this paper in Section 5. 



2 Related Work 

Network measurement has been the subject of a number of studies during the past few 
years. Bolot [2] analyzed end-to-end packet delay and loss in the Internet and used a 
phase plot to characterize the phenomenon in a congested network. In the study, he 
showed that, when multiple packets are transmitted at a low speed, the plot for the 
round trip delay was distributed randomly and, when at a high speed, the plot formed 
a distinctive pattern of distribution. The explanation for the different patterns can be 
attributed to network congestion when the packets are transmitted at the high speed. 
NEPRI [1] is a network measurement tool developed by Fujitsu Laboratories Limited 
based on this theory in which one or more rounds of probing packets are used and the 
phase plots are drawn to detect the pattern that corresponds to network congestion. 
The probing packets in each round are sent at a fixed speed. Based on the phase plot 
for the current round, the probing speed for the next round is adjusted upward or 
downward. As soon as a desired pattern is detected and the result is computed 
satisfactorily, the measurement is finished with the bandwidth result. 

Jacobson [5] developed a measurement tool called pathchar for the measurement of 
bottleneck bandwidth of the links along a path. During the measurement, a number of 
probing rounds with different packet sizes are used to probe each intermediate node 
along the path until the desired destination is reached. The tool has been further 
experimented by Downey [4] and the experience showed that pathchar could yield a 
reasonably accurate measurement results. However, because pathchar is for deriving 
the physical characteristics, i.e., the capacity, of the links in the path, the overhead is 
very high. This is prohibitive for available bandwidth measurement, which has to be 
performed frequently, not to mention that the method may not be readily applicable to 
available bandwidth measurement. 

Bprobe/Cprobe [3] uses multiple rounds of probing messages to calculate the 
bottleneck and the available bandwidth of a path. The measurement result is 
computed based on all the probing rounds at different speeds and with different 
packet sizes. Therefore, the overhead of the measurement is very high because a fixed 
number of multiple rounds of probing are always needed. While this may be 
acceptable for the bottleneck bandwidth measurement, it is required that overhead for 
available bandwidth measurement be kept at the minimal. In addition, probing at the 




An Algorithm for Available Bandwidth Measurement 755 



bottleneck speed for available bandwidth measnrement is a too high price to pay, not 
to mention the dependency of the measnrement on the knowledge about the 
bottleneck bandwidth. 



3 The Measurement Algorithm 

We use active probing in the measurement and regard the network that connects the 
measurement node, or the agent, and the measured node, or the server, as a black box. 
The agent invokes the measurement and derives the result by probing the server with 
a series of probing packets, which triggers the acknowledgment packets. The agent 
also uses a timer for each packet to record its round trip time RTT. We describe how 
the available bandwidth can be derived from RTT along with some other information 
about the probing packets in the algorithm. 

Let Tj. be the sending time of a probing packet and Tj^ the receiving time of the 
acknowledgement packet, then the RTT of the probing packet is 

RRT = Tp — Tg. 

If there is no congestion, the RTT is about the same for every probing packet. If 
there is, the RRT may be larger for the probing packet because of additional delay 
caused by the congestion. Therefore, by comparing the RTTs for different probing 
packets, we are able to determine if network congestion occurs. On the other hand, 
since we don’t know the minimum RTT, we need to use more than one packet to get 
different RTTs. The key is to how systematically generate the probing packets that 
can yield the different and meaningful RTTs. If the RTTs increase, we can conclude 
that congestion has occurred. However, we still cannot conclude that congestion does 
not affect the probing packet with the smaller RTT because of the lack of knowledge 
on the minimum RTT. Nevertheless, the network phenomenon that we can observe in 
the case of congestion is that of larger RTTs. In general, the worse the congestion 
gets, the larger the RTT becomes. Therefore, the only practical way to measure the 
available bandwidth is to cause a brief congestion to a link in the path. This requires 
that sufficient probing traffic be generated and sent through the link to cause the 
congestion. 

We now describe the basic techniques in the algorithm for available bandwidth 
measurement: variable speed probing and zoom-in/zoom-out. 



3.1 Variable Speed Probing 

We use a set of packets at a variable probing speed, from low to high, for the agent to 
probe the server. The optimal range of the speed is one with the property that some 
later packets would cause link congestion while some early ones wouldn’t. In this 
case, we will be able to observe different RTTs and derive the available bandwidth by 
detecting the congestion point based on the RTTs. This congestion point gives us the 
needed information to derive the available bandwidth. This is because, before this 
point, all probing packets should have the same RTT and, after this point, although 
the RTTs may be different, the general trend is consistent and the RTTs should be 
larger than that without congestion. Therefore, to achieve the objective, each 
measurement round would consist of probing packets that impose different bandwidth 
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requirements on the link. That is, probing packets of different speeds (with the same 
packet size) or different sizes (with the same speed) should be used in the 
measurement. In the illustration, we will use the variable probing speed method, while 
the variable packet size is equally applicable. 

The bandwidth requirement on the link by a packet can be determined by its size S 
and the time interval t between it and the next: 

S/t. 

If we let the time interval between packets R and R^j is t l<i<n-l, and t > the 
bandwidth requirement of packet R is 

S/v 

Therefore, given the set of n probing packets {Rj, R^, ..., R_^} of size S but 
decreasing time intervals, the bandwidth requirement of the packets increases as they 
are emitted into the path. 

After collecting the RTTs at the probing agent, we use curve matching between the 
one for the sending probing packets (the sending curve) and the one for the receiving 
acknowledgement packets (the receiving curve). The sending curve is plotted using 
the emission time against the packet number and the time of the first packet is set to 0. 
The receiving curve is also plotted in the same way using the receipt time against the 
packet number and the time of the first packet is aligned to 0. These two curves are 
then plotted on the same plain, which is called curve matching. 



Time 




Sending curve 



Receiving curve 



Fig. 1 .Available Bandwidth Measurement 

We illustrate this measurement technique with an example. In Fig. 1, assume that 
the 7 probing packets are sent out in the interval starting with 6 and decreasing by 1 at 
a time. Assume also that packet 4 reaches the speed where the bandwidth requirement 
causes the link to congest. Due to the congestion, all the subsequent packets will be 
affected by the congestion and have a RTT equal to or greater than that of packet 4. 
The RTTs after the congestion point will increase so that the interval between the 
acknowledgement packets will lag behind that between the corresponding probing 
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packets. The point in the receiving curve where it starts to diverge from the sending 
curve is the congestion point. The bandwidth requirement at this point is then used to 
calculate the available bandwidth. 

Traffic fluctuation in the network may cause noise in the RTTs. As the result, the 
receiving curve may not be as smooth as that in Fig. 1 . In such a circumstance, a trend 
line can be drawn for the receiving curve and estimation techniques can be used by 
taking into consideration of a number of points around the congestion point. In 
addition, because the probing packets are discrete, the congestion speed may happen 
between two adjacent packets. Therefore, using a single point in the calculation of the 
available bandwidth may cause underestimation or overestimation relative to the 
actual bandwidth. Consequently, it may be more appropriate to use more than one 
point in the calculation of the measurement results. 

Following is one way to determine the intervals between the probing packets. For 
simplicity, we assume that the intervals decrease linearly by an equal number. Let the 
probing packet size be S, the number of packets be n, and the bandwidth range for the 
probing be represented by a range (B^^, B^) where B^ is the lower end and Bjj the 
higher end. We further assume that B„>0, Bl> 0 and B^>B^; otherwise, the bandwidth 
range is meaningless. From these parameters, we can determine that the time interval 
decrement is S(Bjj-Bj^)/(BjjBL(n-2)) and the time intervals can be computed using the 
formula: 



ti — 




l< i < n — l. 



We can also made other assumptions on the relationships between the time 
intervals and use the same technique to determine the intervals, which we are 
currently studying as a research topic on probing patterns that could yield better 
performance in terms of measurement accuracy and probing overhead. 



3.2 Zoom-In/Zoom-Out 

The zoom-in technique is used when the measurement detects the congestion point 
but the result doesn’t to meet the required accuracy. This would happen when the 
bandwidth range is large and, therefore, the result is computed by using points with 
large bandwidth differences. To improve the accuracy of the result, zoom-in would 
invoke a new round of measurement during which the bandwidth range is shrunk to a 
range around the congestion point. The selection of the new bandwidth range will 
determine the new B^ and Bjj and the values of the time interval t, l<i<n-l. The 
selection of the new bandwidth range may depend on the quality of the previous 
measurement. If the previous measurement does not give a good curve to clearly 
identify the congestion point, we should not shrink the bandwidth range too much due 
to the risk of missing it altogether in the next round of measurement. Because of this 
limitation, multiple rounds of measurement invoked by zoom-in may be needed. This 
situation could happen if the previous bandwidth range is too large, the number of 
probing packets is too small, the accuracy requirement is too high, the network traffic 
is too volatile, among other factors. Therefore, we use an automatic procedure in the 
algorithm to determine whether zoom-in is needed after every round of probing and 
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measurement. Each zoom-in should bring the measurement closer to the final goal 
and the number of rounds eventually invoked is determined by the number of probing 
packets n, the size of the packet S, the accuracy requirement, the initial bandwidth 
range (B^^, B^) and the fluctuation situation of the traffic. A noisy network could result 
in more rounds before the desired result can be obtained. The zoom-in process can be 
viewed as a necessary procedure to improve the measurement result to meet the 
specified accuracy requirement. 

The zoom-out technique is just the reverse procedure of zoom-in. The purpose of 
zoom-out is to locate the congestion point when the measurement does not identify 
one. Since we don’t have any knowledge about the bottleneck bandwidth, we don’t 
always know what the highest bandwidth should be. Therefore, this mechanism can 
dynamically expand the bandwidth range to make the measurement algorithm 
adaptable to any network bandwidth. Zoom-out basically enlarges the measurement 
area by extending the bandwidth range. It could also move the bandwidth range 
downward if the previous measurement probes too fast or upward if too slow. The 
enlargement and adjustment of the bandwidth range could be used together as well. In 
the algorithm, zoom-out is invoked when the measurement does not identify a 
congestion point. Totally matching curves indicate that the probing speed is too low 
while totally diverging curves indicate that the probing speed is too high. Similar to 
zoom-in, a single zoom-out may still fall short. Therefore, we use an automatic 
procedure to determine if zoom-out is needed through the examination of the 
measurement result. The zoom-out can be viewed as a necessary procedure to locate 
the congestion point used in the calculation of the available bandwidth. 



3.3 The Measurement Algorithm 

The algorithm combines the variable speed probing and zoom-in/zoom-out to make it 

a flexible measurement algorithm. The algorithm can be summarized as follows. 

(1) Using n probing packets of size S and picking a bandwidth range based on past 
measurement or any knowledge or guess about the bottleneck bandwidth, 
invoke the basic probing technique and curve matching to detect the 
congestion point. The selection of the bandwidth range does not affect the 
fundamentals of the algorithm but only the overhead because a bad selection 
could result in more rounds of probing. 

(2) If the congestion point is detected through curve matching, calculate the result 
and determine its accuracy. If the specified or system default accuracy 
requirement is met, the measurement is finished. The bandwidth requirement 
of the probing packet at the congestion point is the result for the available 
bandwidth. The available bandwidth could also be computed by using more 
than one probing packets around the congestion point to neutralize the impact 
of traffic volatility during measurement. 

(3) If the congestion point is detected but the accuracy requirement is not met, the 
zoom-in procedure is invoked to determine a smaller bandwidth range for the 
next round of probing and measurement. The algorithm then continues by 
looping back to (1). 

(4) If the congestion point is not detected, the zoom-out procedure is invoked to 
determine a larger or a different bandwidth range through examination of the 
current measurement. The algorithm then continues by looping back to (1). 
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From the above discussion, it is clear that the extra overhead of the measurement 
beyond the basic probing results from the high requirement on the quality of the 
measurement and from the flexibility of automatically adapting the algorithm to 
measure any bandwidth, both of which are desirable features in any measurement 
algorithms. Without the accuracy requirement, the zoom-in procedure could be 
avoided and, without the need for automatically adapting to any bandwidth, the zoom- 
out procedure could be avoided. No previous work can achieve the same functionality 
and flexibility as that of the zoom-in and zoom-out, not to mention the additional 
overhead. In terms of completeness and performance, we believe that our 
measurement algorithm is superior to all the previous work. 



borwaroABW ■ BacitwardABW ■ MeasuredAUW Accura:;y:D.6 




Fig. 2. Simulation Result with 60% Accuracy 



4 Implementation and Simulation 

We have implemented the presented algorithm and are currently evaluating its 
performance through simulation and experimentation. We intend to fine-tune the 
algorithm through this effort and hope to gain more valuable experiences and draw 
some conclusions regarding available bandwidth measurement in general and the 
measurement algorithm and techniques presented in this paper in particular. At this 
moment, we are able to report some limited preliminary simulation results and will 
expect to have more complete and comprehensive results in the near future along with 
the lessons learned from the effort. 

The simulation results presented here are generated based on the following 
parameters: 

(1) The number of probing packets, i.e., n, are 30. 

(2) The size of the probing packets, i.e., S, is 1024 bits. 

(3) The initial bandwidth range, i.e., (B^^, Bjj), is (500Kbps, 10Mbps). 
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(4) The intervals for the probing packets are determined using the formula for an 
equal decrement in time. 

(5) The actual available bandwidth fluctuates but always below 10Mbps. 

We attach here two simulation results with the accuracy requirements of 60% and 
70% in Fig. 2 and Fig. 3, respectively. In both cases, the measurement is done 
multiple times and a curve is drawn based on the measurement results (in darker 
color) together with that for the actual available bandwidth (in light color) for the 
purpose of comparison. We can see that the measurement results for the 70% 
accuracy requirement are much better than those for the 60% accuracy. However, the 
overhead is higher, which, on the average, incurs 3.2 rounds of probing for the 70% 
accuracy vs. 2.5 for the 60% accuracy. 



KonwaroAUW ■ dacnwardAUW ■ MeasuredAUW Accura^yD.T 




Fig. 3. Simulation Result with 70% Accuracy 



We are currently doing more simulation and will fine-tune the algorithm based on 
the experience learnt from the simulation. In addition, we are incorporating the self- 
similar traffic pattern into the simulation system and will soon start the simulation 
using this more realistic traffic pattern for the Internet. The present and future 
enhancement to the measurement algorithm and techniques as well as to the 
simulation environment will enable us to learn a great and get more insight 
understanding of the Internet traffic dynamics and measurement capabilities. 



5 Conclusion 

We presented a new algorithm for the measurement of end-to-end available 
bandwidth. The measurement uses the active probing approach with a number of 
packets emitted at an increasing speed. By systemically collect and compare the 
probing packets, we can detect whether network congestion occurs, which is the basis 
for the derivation of the available bandwidth. We described the measurement 
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techniques and discussed the various issues that make the algorithm adaptable to any 
network bandwidth and different measurement requirements. We extended the basic 
measurement technique by introducing the techniques of zoom-in and zoom-out to 
achieve the objectives. We also presented some preliminary simulation results to 
show that the algorithm performs as expected. We are currently doing extensive 
simulation and will move on to live experimentation to further fine-tune the algorithm 
and improve the measurement performance. We expect to report more complete and 
comprehensive simulation and experimentation results in the near future along with 
the lessons and experiences we will have learnt throughout this research and 
development effort. 
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Abstract. Simulation is one of the most widely used techniques for de- 
signing network protocols. A simulation framework provides a sandbox 
where a harmful design flaw can easily be detected and removed. This 
is done prior to implementation and experimentation in an operational 
environment as it is easier and cheaper to carry out. However, simula- 
tion results can be distorted if the simulation model is unrealistic. In 
particular the topology model used by a protocol simulation can have a 
great impact on the results. In this paper we present a comparison of the 
results of an oriented multicast protocol simulation performed on some 
of the major topology models currently in use in the network research 
community. 



1 Introduction 

The aim of this paper is to highlight the impact of network topology on network 
protocol simulation. The wide use of simulators such as ns [14] or GloMoSim [12] 
by the scientific community for designing network protocols enforces the need of 
realistic modeling at all levels. The topology models used in simulators have been 
quite simple since the beginning of network simulation but today’s computing 
power makes simulation possible over larger topologies (i.e. graphs). That’s why 
the use of small graphs following grid-like or random models should be changed 
in favor of bigger and more realistic graphs. 

Section 2 gives an outline of previous studies on the influence of network 
topology on protocol simulation as well as an overview of the existing topology 
models. Section 3 presents some properties of the Internet topology and exhibits 
the characteristics of the topology generators that we will use. Section 4 briefly 
describes the oriented multicast routing protocol that we will evaluate by sim- 
ulation. Section 5 shows the influence of the topology models on the protocol 
simulation results for a typical use of our oriented multicast protocol. 



2 Related Work 

The influence of topology on protocol simulation results was already noticed in 
1993 by Doar et al. The efficiency of their multicasting algorithms was reduced 
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by 50% when using random graphs rather than hierarchical graphs [4]. In 1994, 
Wei et al. found that the average node degree of the topology model had an 
influence that could bias results by 30% when comparing the traffic concentra- 
tion in core-based multicast trees and in shortest path multicast trees [16]. Later 
in 1997, Zegura et al. showed that the average delay ratio of center to shortest 
path was increased by a factor that could go up to 100% when using transit-stub 
graphs rather than random graphs [17]. Recently Radoslavov et al. did a thor- 
ough study of the impact of topology on protocol design [13]. They studied three 
well known network topology models (i.e. Waxman, Tiers and ITM) and their 
influence on four multicast protocol paradigms (i.e. multicast trees, forwarding 
state aggregation, endsystem multicast and alternate path routing). They re- 
ported significant result differences depending on the topology model used. Also 
in a recent study Palmer et al. evaluated their STORM multicast algorithm 
with 4 topology models, including PLOD and Waxman [11]. Although the av- 
erage packet overhead was roughly the same (for a 50-client or above topology) 
whatever the generator used, the plot of the distribution of the percent of proto- 
col overhead per node was highly tied to the type of generator. This paper is an 
extension of these previous studies. We examine an oriented multicast routing 
protocol which is very dependent on the underlying topology and has not been 
already studied in this context. We also use, in addition to the others, a new 
topology generator that matches more closely the properties of the Internet and 
that has not been tested in the previous studies (i.e. BRITE). 

Concerning network topology models, a well-known early model was defined 
by Waxman in 1988. This model places the nodes randomly on a plane and then 
creates links between nodes with a probability depending on the nodes’ euclidean 
distance [15]. The Waxman model belongs to what we call flat topology class. 
Circa 1996, two new generators were created, namely Tiers [3] and ITM [?]. 
Both are based on a structured network creation process designed to match the 
Internet architecture. The ITM topology model is called transit-stub because it is 
based on the Autonomous Systems’ structure. It has been widely used in network 
simulation tools (e.g. it is distributed in ns). Tiers is based on the LAN-MAN- 
WAN structure of the Internet. Tiers and ITM generators are both belonging 
to what we call the hierarchical topology class. Recently new generators were 
created to build graphs that follow the power-law properties of the Internet. 
Some of them are already available for testing, namely BRITE [10] and Inet2 [6]. 
These generators belong to the power-law topology class. In our paper we will 
only deal with the generators of the last two classes. 



3 Network Topology 

A network is typically modeled as an undirected graph. The topology of the 
graph is a description of the way the nodes are connected together. Properties, 
such as the average node degree and the diameter, give information on a graph 
topology. For network protocols, the knowledge of the topology of the medium 
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is very important as it directly translates into useful information such as path 
length and path redundancy. 

3.1 Internet Topology 

An accurate knowledge of the topology of real networks is necessary to design 
graph generators. Recently the topology of Internet has been investigated a lot 
and new results have been discovered. In particular, some topological properties 
were found to comply with power-laws. For example, Faloutsos et al. discovered 
that the node degree distribution of the Internet topology complies with two 
power-laws [5] (both at the AS-level and at the router-level). The exponents 
of these power- laws concisely describe their corresponding distributions. The 
trees’ part of Internet has also been studied by Magoni et al. who discovered 
three power- laws that apply to the size and depth distributions of the trees [8] 
(at both AS and router levels). 

3.2 Topology Models 

Table 1 shows the most common topology models currently in use by the research 
community. As software packages, they are all freely available except PLOD. 



Table 1. Topology models 



Class 


Model(s) 


Date & references 


Flat topology 


Waxman 


1988 - [15] 


Hierarchical topology 


Tiers 

Transit- stub 


1996 - [3] 
1996 - [17] 


Power-law topology 


BRITE 

Inet2 

PLOD 


1999 - [10] 

2000 - [6] 
2000 - [11] 



In the flat topology models, the edges are created with a probability de- 
pending on the distance of the corresponding nodes. In the hierarchical topology 
models, subgraphs modeling network parts are first generated by using a flat 
topology method (as in Transit-Stub) or a spanning tree method (as in Tiers). 
Then the subgraphs are connected together in a way that enforces a multi-level 
tree- like structure. In the power- law topology models, the edges are distributed 
to the nodes in a way that matches the skewed node degree distribution of the 
Internet. This can be done by reverse engineering (as in Inet2 and PLOD) or by 
the use of preferential connectivity and incremental growth (as in BRITE). 

We run the simulation only on the graphs of the last two classes. The flat 
topology class and the Waxman model in particular has already been widely 
studied and its drawbacks are well-known. Furthermore, we don’t test Inet2 
because this generator has been designed to create AS-level topology graphs. 
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4 Protocol Simulation 

We want to evaluate the influence of the topology model used on the simulation 
results of our agent search protocol. As it is not fully defined yet, we will call 
it an algorithm rather than a protocol through the rest of this paper. This 
algorithm has been described in a previous study [9]. In short, our algorithm is 
an improvement over the expanding rings search mechanism. We want to And 
agents that are located between a given source and destination. We assume that 
the initiator of the search knows the address of the destination. Packets are 
multicasted in a controlled way, so that they do not go too far off the shortest 
path from the initiator to the destination. Each packet contains a range field 
that indicates how many hops the packet is allowed to do when it is out of the 
source-destination shortest path. 

Our agent search algorithm is based on an oriented multicast algorithm that 
is very sensitive to the underlying network topology. This oriented multicast 
algorithm has also been described in a previous study [7]. We compare our agent 
search algorithm to the expanding rings search (ERS) algorithm. The expanding 
rings search has been described in protocols such as YAM [2] and QoSMIC [1]. 



5 Influence of Topology 

In this section we give the results of the simulation of the agent search and ERS 
algorithms for each of the topology model tested. We also explain how we got 
the results (i.e. how we set the parameters of the generators and the simulator). 



5.1 Simulation Parameters 

Table 2 gives the parameter settings of the generators. 20 graphs by topology 
generator have been generated. Each graph has 2000 nodes and contains 1% of 
agent nodes. Each algorithm (ours and ERS) has been tested on 500 different 
source-destination pairs, for four given source-destination distances, for each 
graph. So for a given source-destination distance, each algorithm has been tested 
10000 times. 

The simulations have been carried out with the network manipulator soft- 
ware. It is a static network simulator that we have implemented in our labora- 
tory. It is static because it does not take into account any temporal aspect of 
the communications. The results of these simulations have been merged to give 
average results. We made these simulations for source-destination distances of 
4, 8, 12 and 16 hops. For the ERS, the TTL is increased by 2 while no optimal 
agent is found, starting at 1 up to a maximum value of 7 (i.e. 1, 3, 5, 7). For 
our algorithm, the range is increased by 1 from 1 to 4. It is possible for the 
algorithms not to And any optimal agent because they have to stop their search 
at some point. 
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Table 2. Parameter Settings for the Generators 



Generator 


Parameters 


Value(s) 


Transit-Stub 


T^transits in top 


20 




T^nodes / transit 


4 




T^stubs/transit 


2 




^nodes/stub 


12 




Edge method 


Waxman 




Alpha 


0.5 




Beta 


0.5 


Tiers 


NW 


1 




NM 


220 




NL 


0 




SW 


680 




SM 


6 




SL 


0 




RW 


6 


BRITE 


HS 


1000 




LS 


100 




m 


1, 2 




Node placement 


Random 




PC 


Only 




IG 


Active 



5.2 Simulation Results 



In this section, we present the simulation results of four variables of interest. 
These variables are the bandwidth usage, the number of optimal agent found, 
the number of attempts to find an optimal agent and the efficiency. For each 
variable, we calculate its value by using our agent search algorithm and by using 
the ERS algorithm. We divide the former value by the latter to obtain a ratio 
that enables an easier comparison. Furthermore, as we carried out tests on four 
different distances, we calculate the average of the four ratio values. So we have 
one ratio value left for each of the generators used. 

Figure 1 shows the bandwidth ratio given by each of the topology models. 
For example, the Transit-Stub topology model has a value of 1.4. This means 
that our agent search algorithm creates on average 40% more packets than the 
ERS algorithm. We can clearly see that there are big differences between the 
results and that they depend on the type of graphs used for the simulation (i.e. 
the kind of topology model used). We can already say that the topology model 
used has a big influence on the results. The biggest gap is a -68% difference that 
can be found between the Tiers ratio and the BRITE 2 (i.e. with m = 2) ratio. 

Figure 2 shows the number of optimal agent hit ratio. For example, the 
Transit-Stub has a value of 1.5. This means that our algorithm finds 50% more 
optimal agents than the ERS algorithm. The variations between the topology 





Influence of Network Topology on Protocol Simulation 



767 




Fig. 1. Bandwidth Ratio 

ratio values are of lesser importance than in the previous figure but they are still 
significant. 




Fig. 2. Optimal Agent Hit Ratio 



Figure 3 shows the average number of attempts needed to find at least one 
optimal agent. The Transit-Stub value of 0.6 means that our algorithm needs 
on average 40% less attempts to find at least one optimal agent than the ERS 
algorithm. Here too, the values depend on the topology model. 

We have defined a ratio called efficiency to be able to assess the algorithms’ 
performances. The efficiency is equal to the number of optimal agents found 
divided by the number of packets emitted in the network. As usual we divide 
the efficiency of our algorithm by the efficiency of the ERS algorithm to obtain 
an efficiency ratio. Figure 4 shows the efficiency ratio of each topology model. 
They are all greatly different. Tiers favors the ERS over our algorithm, while 
the others favor our algorithm. The difference between the Tiers ratio and the 
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Fig. 3. Attempt Number Ratio 

BRITE 1 ratio reaches +198%. The performance of our algorithm over the ERS 
algorithm is heavily influenced by the topology model used. 
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6 Conclusion 

We showed on a particular multicast algorithm example that the kind of topology 
model used for protocol simulation has a crucial impact on the simulation results. 
A protocol performance could be favored by a topology model or disfavored by 
another. This situation can lead researchers to avoid using simulation for design 
protocol. Perhaps the best way to draw acceptable conclusions would be to use 
a topology model that is closest to the real network topology where the new 
protocol will be deployed. For IP protocols, and routing protocols in particular, 
the use of the most recent topology models (i.e. of the power-law topology class) 




Transit-Stub Tiers BRITE 1 BRITE 2 



Fig. 4. Efficiency Ratio 
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should be recommended. However, there is still room for improvement in creating 
a topology model that would match the Internet topology. Indeed, even the 
most up-to-date generators do not take into account all of the Internet topology 
properties that have been newly discovered. Simulation is such an important tool 
in network research that it can not be neglected because of a lack of realistic 
topology generators. It is clear that new enhanced topology models will appear 
in the near future to reduce the bias owing to topology on protocol simulation 
results. 
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Abstract. Compared with the traditional single path routing model, 
multipath routing increases total network utilization and end-to-end per- 
formance. When disseminating traffic into multiple paths, routers should 
adaptively allocate flows to each path in order to achieve load balancing 
among multiple paths, as most IP flows are short-lived and the flow size 
is not normally distributed. Moreover, routers should distribute packet 
streams belonging to a flow into the same next-hop not to cause end-to- 
end performance degradation. This paper proposes an adaptive multi- 
path load control method using a flow classifier which detects long-lived 
flows through the flow characteristics of the duration and the size. By 
dividing flows into long-lived and short-lived, congestion from the bursty 
transient flows may be avoided. It is shown by simulation experiments 
with the real packet trace that the proposed algorithm adaptively con- 
trols the load of multiple paths satisfying the given load ratio, and the 
minimal per-flow states at routers can be maintained by aggregating 
flows with the destination network prefix. 

Keywords: Flow, load control, multipath 



1 Introduction 

A router capable of multipath routing maintains multiple next-hop nodes for 
the same destination in its routing table. Multipath routing provides increased 
bandwidth and enhances the utilization of network resources more than the tra- 
ditional Internet routing mechanism based on the single shortest path algorithm. 

Multipath routing has been incorporated in several routing protocols. The 
best-known one is the Equal-Cost Multi-Path(ECMP) routing. This is explic- 
itly supported by Open Shortest Path First(OSPF) and Intermediate System 
to Intermediate System(IS-IS). Some router implementations allow equal-cost 
multipath for Routing Information Protocol (RIP). In the Multi-Protocol Label 
Switching(MPLS) network, where IP datagrams are switched by looking up the 
fixed-size label, paths between an ingress router and an egress router are ex- 
plicitly set up by Explicitly Routed Label Distribution Protocol (ER-LDP) or 

* This work was supported in part by the Brain Korea 21 project of Ministry of Edu- 
cation, in part by the National Research Laboratory project of Ministry of Science 
and Technology, 2001, Korea. 
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the Resource ReSerVation Protocol(RSVP). Therefore, multiple explicit Label 
Switched Paths(LSPs) between an ingress router and an egress router can be set 
up and there can be even non-shortest paths for multipath routing. 

When forwarding packets to multiple paths, routers should have an adaptive 
load control function for load balancing across parallel paths in order to sup- 
port dynamic traffic behaviors and varying link/path characteristics (available 
bandwidth, delay, and packet loss rate). Otherwise, some of multiple paths may 
experience significant congestion due to the high traffic load. 

In this paper, we propose a simple flow classifier based algorithm as the flow- 
aware adaptive multipath load control scheme. The proposed algorithm has the 
following features. 

— Flow-level load control of multiple paths when the load ratio for each path 
is given: The input traffic can be split to satisfy the pre-deflned load ratio 
of each path in a flow-level multipath forwarding mode. The sequence of IP 
packet streams should be maintained within a flow. Otherwise, the receiver 
must handle out-of-order packet arrivals with a large buffer, and end-to-end 
performance will be degraded. 

— Minimal per-flow states: The number of per-flow states retained by a router 
should be as small as possible. 

— Differentiation between long-lived flows and short-lived ones: 0 suggests 
that long-lived flows have less bursty arrival characteristics than short-lived 
flows. The bursty transient flows can abruptly increase the queue length at 
routers, causing packet losses. 

The organization of this paper is as follows. In Section El the related work 
for the multipath and traffic engineering is explained. Section 0 presents the 
flow-level adaptive load control problem in multipath forwarding. Then, Section 
El describes the proposed flow-level load control algorithm. The results of the 
performance evaluation are discussed in Section 0 and the conclusion and future 
work are given in Section 0 

2 Related Work 

There have been many studies on multipath routing. 0 proposes a multipath 
forwarding extension scheme for the distance vector and the link state routing 
protocol. In 0, Quality-of-Service(QoS) routing via multiple paths for the time 
constraint is proposed when the bandwidth can be reserved, assuming all the re- 
ordered packets are recovered by the optimal buffer at the receiver, which causes 
the overhead of the dynamic buffer adjustment at the receiver. In connection- 
oriented networks, 0 has analyzed the performance of multipath routing al- 
gorithms and has shown that the connection establishment time for multipath 
reservation is significantly lowered. 0 has proposed a dynamic multipath rout- 
ing algorithm in connection-oriented networks, where the shortest path is used 
under light traffic conditions and multiple paths are utilized as the shortest path 
becomes congested. 
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To avoid the negative effects by the bursty short-lived ffows, the enhanced 
routing scheme separating long-lived and short-lived ffows is proposed in m 
where long-lived ffows are dynamically routed whereas transient ffows are for- 
warded on the pre-provisioned paths. However, the flow trigger is considered 
only under the static network provisioning policy. In a hashing-based load 
control method without ffow states is proposed, but the load adaptation scheme 
for the dynamic network and traffic behavior is not well presented. In El , it is 
shown that the quality of services can be enhanced by dividing the transport- 
level ffows into UDP and TCP ffows. Yet, it does not consider the aggregated 
ffows. 



3 The Flow-Level Load Control Problem 

In this section, we examine how packet-level multipath forwarding may degrade 
the end-to-end throughput and why the adaptive flow-level load control method 
should be devised for multipath forwarding. 

3.1 Negative Impacts on End-to-End Performance 
by Packet-Level Multipath Forwarding 

Packet-level multipath forwarding in a round-robin fashion may cause the end- 
to-end performance degradation. 




Fig. 1. The simulation topology 



When the router i?l distributes incoming packets destined for D to two next- 
hops(i?2 and R2>) concurrently(Fig.ni), the effect of the different delays on TCP 
performance is illustrated in Fig. |21 Fig. EKa) represents the case where both 
the upper path(i?l — R2 — i?4) and the lower path(i?l — R3 — R4) are set to 
100 ms, and S sends packets to D after opening an FTP connectiorQ. Fig. a 
(b) is for the same FTP connection run under different delays (the upper path 
set to 200 ms). In Fig. 0-(b), the congestion window (cwnd) at S periodically 
decreases by half due to fast retransmit and fast recovery algorithms, resulting 
in the poor TCP throughput. When two paths have different delays, packets 
with higher sequence number may arrive at the receiver too early, causing the 



^ This simulation was tested for TCP Reno with NS-2[1 3j 
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(a) Same link delays (b) Different link delays 



Fig. 2. TCP congestion window behavior under different path delays in packet-level 
multipath forwarding 



receiver to send duplicate ACKs. After receiving three duplicate ACKs, the 
sender retransmits the late arrived packet again and reduces cwnd. In addition 
to the three duplicate ACK problem, the increasing speed of cwnd is slow because 
the ACKs with lower sequence numbers, which may arrive at the sender later 
than the ACKs with higher sequence numbers, are ignored. 



3.2 Skewed Flow Characteristics 

Most IP flowS are shown to be short-lived and small, whereas a few ones have 
long duration and large traffic loads, dominating the total traffic load in a link 
or pathpj. Hence, we examine the load balancing condition in general flow-level 
multipath forwarding. 

Assuming that packet arrivals are modeled as packet trains 0, multiple paths 
are identical, and the packet size is normally distributed, then the flow-level 
round-robin load balancing can be explained by the following lemma, which is 
defined for load balancing by multiple identical servers in 0. 

Lemma 1. (Flow-level Round-Robin Load Balancing): Let li be a random vari- 
able describing the total delivery time required for all the flows mapped to a given 
path Pi. Let r' be a random variable of the delivery time for a flow, N be the 
number of packets, and N' be the number of flows in the batch or train. Lf N' 
flows are assigned to m paths in a round-robin manner, then the square of the 

^ IP flows are defined by packet arrivals satisfying the end point specification(network 
addresses, transport protocol, and application port) within a time interval. 
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coefficient of variation of U is given by 



CV[k? = (^) CV[rf 



( 1 ) 



and hence, when r' has finite variance and N' ps N 



lim CV[k] = 0. 



( 2 ) 



From the above lemma it is concluded that for sufficiently large packet train 
size, the loads in a multiple path set are balanced if the coefficient of the variation 
of this normal distribution tends to zero. 

N' and r' are dependent on the flow organization in a batch or train. The 
number of flows N' for the given N packets varies from 1 to N . Therefore, when 
a few flows carry most of the packets(iV' <C N), the load balancing can not be 
achieved, because N' is quite small compared to the large N especially when the 
flow granularity is coarse. 

When fi and rj denote the number of packets and the delivery time of a 
packet for a flow i respectively, the flow delivery time r' will be fi ■ ri. The 
expectation of the flow delivery time r' is as follows: 



Therefore, the square of the coefficient of variation of the flow delivery time 
will be as follows. 



Thus, Var[f] should be finite in order for the flow delivery time r' to have 
a finite coefficient of variation. However, the skewed flow size distribution may 
result in a very large variation. In Fig. yfl, for example, even 1 % of flows contain 
65 - 90 % of the load in byte percentage, and 57 - 88 % in packet percentage. 

4 The Proposed Load Control Scheme 

We develop control scheme for routers with two next-hops (a primary path and 
a secondary one) for the same destination. This scheme can be easily extended 
to multiple next-hop cases. 

4.1 Flow Classification 

For flow assignment, flows which have long duration, high-bit rate, and large 
flow size(called “base” flows) are distinguished from short-lived transient ones, 
and assigned to the primary path. Fig. 0 depicts the ingress router with the 
flow classifier. Packets not belonging to base flows(called “transient” flows) are 
forwarded to the secondary path. 

® This trace was measured for one hour on KORNET, a commercial Korean Internet 
backbone, by Cisco NetFiowjl 4j. 



E[r'] = E[f] ■ E[r] 



( 3 ) 




( 4 ) 
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(a) Byte percentage 



(b) Packet percentage 



Fig. 3. Traffic load distribution of 1/5/10 % flows 




Primary Path 
Secondary Path 



Ingress Router 



Fig. 4. The ingress router with load control 



The base flow detection is based on the X/Y(X: packet count, Y: timeout) 
flow classifier used in IP switch^. In the X/Y flow classifier, a flow is detected 
when X packets with the same flow specification arrive within Y seconds. This 
means that the initial X packets of a base flow are forwarded to the transient 
path. By adjusting X and Y, we can easily control the load assigned to each 
path. If we increase Y(or decrease Y), then less flows will be detected and the 
load to the primary path will decrease. Decreasing X(or increasing Y) will do 
the opposite. Thus, adaptive X/Y flow classifier can adapt to dynamic path and 
traffic behaviors. The packet forwarding module delivers an incoming packet to 
an appropriate next-hop by looking up the flow table. 
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4.2 Load Control Algorithm 

The load ratio of the primary path is measured by the number of packets sent 
along the primary path over the total number of packets. The load control algo- 
rithm uses the adaptive base flow classifier to meet the given load ratio of the 
primary path. Although there are two possible adaptive parameters in the X/Y 
base flow classifier, the flow size, X, is chosen to be variable. The flow size X of 
the adaptive base flow classifier is adjusted according to the most recent base flow 
load{B F L{t)) . If BFL{t), which is smoothed by the previous value{B F L{t — 1)) 
and the recent sample{S ampleBFL), is greater than the given base flow load 
threshold(BFLt?ir), the flow size X is increased by A. Otherwise, the flow size 
is decreased multiplicatively by the pre-defined constant, C. A is set such that 
the base flow size estimator X does not increase too quickly. A constant k is 
used to adjust the increasing amount of A. 

A=j,{k>l) (5) 

The most recent base flow load in the interval [t-1, t] uses the first-order filter 
to dampen the abrupt fluctuation of the base flow load. When a approaches 
1(0 < a < 1), abrupt changes are suppressed. 



Algorithm 1. Adaptive Load Control Algorithm 
1: BFL{t) = a ■ BFL{t - 1) -t (1 - a) • SampleBFL 
2: if {BFLit) > BFLthr) then 

3: ^ = f 

4: X = X + A 

5: else 
6: X = § 

7: end if 



5 Performance Evaluation 



To evaluate the load control algorithm, packet traces at the border router of 
our campus network were captured with tcpdump, and the full routing table 
of the border router was used. The traffic through the border router shows an 
average of 4 - 5 Mbps and the traditional traffic pattern of TCP(FTP, WWW) 
applications. 

To compare the pre-defined load ratio of the primary path and the detected 
base flow load, we define the normalized base flow load ratio variation, B(%), 



B = 100* 



\BFLa 



BFLthrl 



BFLthr 



( 6 ) 
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where BFLthr and BFLa are the threshold and the acquired base flow load 
ratio for the primary path, respectively. 



Normalized BFL Variation Maximum Number of Flows 




Fig. 5. Normalized base flow load ratio variation and maximal number of flows 



The proposed algorithm requires pre-flow state at routers, and not scalable. 
This is overcome by aggregating flows going to the same destination. The aggre- 
gation can be done in different levels: application, host pair, destination host, or 
destination network. 

From Fig. El- (a) we can see that the proposed algorithm satisfies the base 
flow load threshold within 10 %. Among four flow aggregation types, the desti- 
nation host flow mode shows the lowest normalized variation(l %) under various 
base flow load thresholds. This is because the variation of the flow size and the 
flow duration is rather high except the destination host flow aggregation which 
generates normally distributed flows. 

For the proposed algorithm, the ingress router should maintain the entire 
per-flow states. The maximum number of flows will affect the scalability of the 
proposed algorithm. In Fig. El-(b), the destination network prefix aggregated 
flows require the minimum number of per flow states even at high threshold. In 
conclusion, we can see that base flow load assigned to the primary path does not 
deviate much from the given threshold with the minimal memory requirement. 



6 Conclusion 

In this paper, we proposed an adaptive flow-level load control algorithm for 
practical multipath forwarding. It is shown by experiment that the proposed 
algorithm, which uses the adaptive X/Y flow classifier, divides the input traffic 
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in a way to satisfy the pre-defined load ratio of multiple paths in order to absorb 
the dynamic flow characteristics. The number of per-flow states required for 
a multipath packet forwarding router can be minimized by aggregating flows 
with the destination host or network prefix. Through this load control scheme, 
the network resource can be fully utilized and the congestion from the bursty 
transient flows can be avoided. The proposed load control scheme will be useful 
for multipath packet forwarding without much additional overhead at routers. 
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Abstract. In this paper, we analize by queuing-simulation CANIT (Con- 
gestion Avoidance with Normalized Interval of Time) algorithm perfor- 
mances in presence of congestion losses. In a former work [3], we proposed 
the algorithm (CANIT) for TCP (Transmission Control Protocol) con- 
gestion avoidance phase in order to improve fairness during this phase, 
and we showed that using CANIT algorithm in an environment without 
loss, instead of standard congestion avoidance algorithm improves both 
congestion avoidance fairness and bandwith utilization for long RTT con- 
nections. In this paper, we consider congestion losses and show that the 
fairness as well as the bandwith utilization are more efficient when using 
CANIT algorithm than the standard one. Moreover, the losses in CANIT 
algorithm are equivalent to those in standard congestion avoidance. 



1 Introduction 

TCP (Transmission Control Protocol) is a sliding window protocol which al- 
lows the sender to transmit a given number of segments before receiving an 
acknowledgment (ACK). TCP uses a set of congestion control algorithms: slow 
start, congestion avoidance, fast retransmit and fast reeovery lani, to control 
the sliding window size used by the sender and to retransmit lost packets. 

The slow start algorithm is used at the beginning of a TCP connection or 
after a congestion detected by a timeout Q. Congestion Window size (CWnd) is 
initialized by 1 segment (in practice, CWnd is measured in bytes, usually 512 
bytes for one segment, but, to simplify discussion, it is expressed here in terms 
of segments). For each aknowledgment received, slow start algorithm increases 
CWnd by one segment, providing an exponential increase of the sliding window 
size. Slow start phase continues until either CWnd reaches a given Slow Start 
Threshold {SSThresh) value or a segment loss is detected. 

Slow Start phase is followed by congestion avoidance phase, during which 
the value of CWnd is greater than or equal to SSThresh, and TCP sender 

^ TCP sender detects losses (congestion) in two different ways: by a timeout or by 
reception of three duplicate ACKs. 



P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 780- 178^ 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



CANIT Algorithm in Presence of Congestion Losses 781 



increments its current CWnd by — — ; — - each time an ACK is received (here, 

CWnd 

CWnd is measured in number of segments). Congestion avoidance phase contin- 
ues until a segment loss is detected. If congestion is detected by a timeout, the 
congestion window is set to one segment after a retransmission timeout, and the 
sender proceeds with the slow start. Otherwise, fast retransmit and fast recovery 
algorithms are used. These last allow TCP to detect and recover from segment 
drops. For details, see p. 

While these algorithms are very important, they can also have a negative 
impact on long delay link performance of TCP, for exemple, satellites links 0. 

In this paper, we focus on congestion avoidance algorithm. The policy used 
in that algorithm is unfair when multiple connections with different RTTs share 
the same resource in the network. In fact, the senders of long RTTs connections 
make more delay to receive an ACK than those of short RTTs connections. So, 
these last increase their sliding windows more frequently. Notice that Round 
Trip Time consists on a segment transfer time and its ACK transfer time. TCP 
estimates its RTT at the beginning of connection by sending one segment at 
connection establishment and waiting for its ACK. For details, see 0. 

In order to improve the TCP fairness, several solutions are suggested EI5H . 
In 0, we proposed a new algorithm for congestion avoidance phase, named 
CANIT algorithm (Congestion Avoidance with Normalized Interval of Time). 
That algorithm increases, for all connections sharing same network ressources, 
the size of congestion windows, by approximatively the same number of segments 
during the longest RTT of these connections. For that, it uses a new parameter 
NIT (Normalized Interval of Time), which represents an interval of time, dur- 
ing which, each connection must increase its CWnd by one segment. In |3|, we 
consider an environment without losses and show that CANIT is more fair than 
the standard algorithm. In this paper, we consider a lossy environment and we 
compare CANIT algorithm performances to those of standard one using one of 
the configurations used in |3| with finite buffers. 

The paper is organized as follows. Section 2 briefly outlines congestion avoid- 
ance and CANIT algorithms. Section 3 present our configuration and its queuing 
model. In section 4, simulation results are used to compare the standard algo- 
rithm and CANIT performances in presence of congestion losses . Finally, section 
6 presents our conclusions. 



2 Congestion Avoidance and CANIT Algorithms 



When CWnd becomes greater than SSThresh in slow start phase or after fast 

retransmit and fast recovery phases, congestion avoidance phase is used HU. In 

. . SegSize * SegSize 

this phase, 1 CP sender increments its current CWnd by 



CWnd 
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each time an ACK is received where CWnd is measured here in bytes, and 
SegSize is the size of segments (usualy equal to 512 bytes). That means that an 
CWnd is roughly increased by SegSize per RTT. 

As mentioned above, the policy which is used in congestion avoidance phase 
is unfair when multiple connections with different RTTs share same resources 
in the network. In fact, waiting for ACK in long RTT connections (for exemple 
satellites links) takes more time than it makes for short RTT ones. That con- 
nections increase then their sliding windows more frequently. 

CANIT is proposed in 0 in order to improve congestion avoidance fairness. 
It increases congestion window sizes for all connections sharing the same network 
ressources, by approximatively the same number of segments during the longest 
RTT of these connections. For that, it uses a new parameter NIT (Normalized 
Interval of Time), which represents an interval of time, during which, each con- 
nection must increase its CWnd by one segment. Thus, the following policy is 
used by all connections: 

— For a connection, after reception of an acknowledgment (ACK), TCP sender 
increments its CWnd by 

RTT SegSize X SegSize. 

NIT ^ ^ CWnd 

where, RTT is the round trip time of the connection, SegSize is the segment 
size and NIT is the normalized interval of time. 

As mentioned above, in order to make discussion easier, we express CWnd 
in number of segments. Then, the additive increase can be expressed by 

RTT 1 
NIT ^ ^CWnd’' 

Consequently, in absence of losses, TCP sender of a given connection k must 
receive, at time t, CWnd(t — RTTk) aknowledgments during its RTT (RTTk). 

RTTf^ 

That means that, CWnd is increased by segments after each RTTk - Then, 

if we consider the interval of time NIT (which must be shorter than RTTk), we 
can say that CWnd is incremented by one segment during this interval regadless 
of the value of RTTk- However, it is not true because the sender does not re- 
ceive the AKCs uniformly (i.e. the arrival processus of ACKs is not distributed 
uniformly). Indeed, TCP sender receives usually the ACKs in burst because the 
segments are sent in burst at the beginning. However, our objective is that all 
connections increase their congestion windows by the same size during a certain 
interval of time (which is greater than or equal to the longest RTT) even if the 
connections with short round trip time open their windows more frequently than 
those with long RTTs. 
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3 Configurations and Queueing Models 

In jSj, we studied CANIT performances in absence of losses for two different 
configurations (depicted in Figure The first is composed of 5 connections, 
with different round trip times, sharing a single bottleneck link. The second con- 
figuration is used for study the impact of a long RTT connection which traverses 
a number of short RTT connections. We have considered a traditional “first-in, 
first-out” (FIFO) queueing scheme and gateways with infinite buffers. 

In this paper, we study both of configurations in and we consider a lossy envi- 




Fig. 1. Tow simulation configurations 



ronment in order to compare performances of the CANIT algorithm to those of 
standard congestion avoidance algorithm. For that, we consider gateways with 
finite buffer. In this case, the segments arriving from sender to a gateway are 
lost when the gateway buffer is full. TCP sender detects this loss when no ACK 
is received for a certain segment (either by a timeout or by reception of three 
duplicate ACKs). 



3.1 The Configuration 

We give in this paper only the results for configuration 1, those of configura- 
tion 2 are equivalents to those of configuration 1. This last is used to study our 
algorithm performances for 5 connections with different delays sharing a single 
bottleneck link. 
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Let’s Si the source of connection i, and linki (i = 1, ..,5) the link between Si 
and the Gateway. 

linki {i = has a delay equal to 10ms, 20ms, 100ms, 200ms, 400 ms 

respectively, and each linki has a capacity equal to lOMb/s. 

The shared link {linksh) has a delay equal to Sms and a capacity equal to 
l,5Mb/s. 

We assume that acknowledgment path has the same delay as forward path. 
Then the following values of RTTi{i = 1, ..,5) are equal to 2 x {linki delay + 
linksh delay) 



3.2 The Model 

Each linki, {i = 1, ..,5) is modeled by a FIFO multiple-server queue (Figure EJ. 
The number of servers represents the capacity of the link (measured in number 
of segments), and time of the service represents the link delay. 

Si is modeled by a FIFO queue which holds an infinity of segments (we consider 
in this paper the long life connections) and the service depends on ACK received 
and on, of course, the used algorithm in congestion avoidance phase (CANIT al- 
gorithm or the standard one). 

The Gateway is a FIFO multiple-server queue where the number of servers 
represents the capacity of linksh and the service time represents the linksh” 
delay. 



Si / Gateway 




Fig. 2. Queuing Model for links of configuration 1 
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3.3 Performance Parameters 

In order to study and compare CANIT algorithm performances to those of stan- 
dard congestion avoidance one, we use the following metrics: Fairness and Uti- 
lization. 



— Fairness: If there are N flows through a bottelneck link, each flow receiving 
of the capacity of that bottelneck link. We use then Jain’s metric of 

fairness: For N flows, with flow i receiving a fraction bi on a given link, the 
fairness of the allocation is defined as: 



Fairness = 






(Fairness = 1 corresponds to equal allocation for all users). 



— Utilization: We define Utilization as the fraction -of the available bandwidth- 
used by connections. 



Utilization = 



number of transfered segments x size of segment 
rate of link x transfer time 



Here, transfered segments are original ones (retransmissions are not consid- 
ered) . 



In 1^, we show that CANIT algorithm improves successfully fairness in TCP 
congestion avoidance. When using CANIT algorithm, performances parameters 
(Fairness and Utilization are more efficient than those when using standard 
congestion avoidance algorithm. Moreover, when using CANIT algorithm with 
NIT equals to the shortest RTT of connections sharing the same bottelneck, 
the network is more fair and the resources utilization is more efficient. In what 
follows, we discuss the results concerning the lossy environment. 



4 Simulation Results 

This section discusses our main results obtained by simulation of the queuing 
model described in section 3, in presence of losses. First, Figure 0 shows the 
fairness behaviour of both of algorithms. TCP is more fair when using CANIT 
algorithm (with different values of NIT) than when using standard congestion 
avoidance algorithm. 

Figure 0 shows that utilization (here, we consider utilization of the shar- 
ing link) bandwith is more efficient for some values of NIT, and, when using 
CANIT with NIT= 30ms, both of fairness and Utilization are more efficient 



786 



H. Benaboud and N. Mikou 



fairness 




NIT 



Fig. 3. fairness vs. NIT 



than standard algorithm. That value represents the minimum of RTTs of the 5 
connections. 

In Figures 0and0 we use NIT equal to the shortest RTT (here =30ms). and 
we simulate our configuration with different values of buffer capacity. These fig- 
ures show that TCP is always more fair when using CANIT algorithm with 
NIT=30ms and the utilization bandwith of shared link is more efficient than in 
standard algorithm case. 

In Figures Q and 0 we give a comparison between the bandwith utilization for 




NIT 

Fig. 4. Bandwith Utilization of shared link vs. NIT. 



long RTTs connections (connection 4 and connection 5). So, as showed in figures 
0and0 for both of them, utilization bandwith is more efficient for CANIT al- 
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gorithm than for standard one. This result is very important especially in TCP 
over satellite context. 

Another important performances parameter is the loss probability. We show 
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Fig. 5. fairness vs. gateway buffer capacity. 
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Fig. 6. Bandwith Utilization of shared link vs. gateway buffer capacity 



in figures El that, even if CANIT improves fairness and utilization bandwiths, 
the losses are equivalent to those of standard algorithm, espacially, when buffer 
capacities are greater than 800 segments (in this case). 
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5 Conclusion 

CANIT algorithm is used in order to improve fairness in TCP congestion avoid- 
ance. We show, by simulations of queuing models that, when using CANIT algo- 
rithm in a lossy environment, performance parameters (fairness and utilization) 
are more efficient than those when using standard congestion avoidance. More- 
over, when using CANIT algorithm with NIT equal to the shortest RTT of con- 
nections sharing the same botteleneck, the network is more fair, the ressources 
utilization as well as the bandwith utilization are more efficient, and the loss 
probabilities are equivalent compared to those generated by standard algorithm. 
However, implementation of CANIT algorithm requires a change of congestion 
avoidance mechanism at TCP sender. And, NIT estimation requires an addi- 
tional mechanism. The impact of estimation time on TCP performances using 
our algorithm is our futur work. 
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Abstract. In this paper projects for engineering stndents are described. 
Three different practical subjects on networking are suggested: a point- 
to-point link, a VoIP implementation on a LAN and an Office network. 
These projects are set up to offer Master students in Electrical En- 
gineering (Telecommunications option) of the University of Leuven 
(K.U. Leuven) a realistic practical background. The authors consider such 
projects as a necessity, supplementary to the theory. With those projects, 
the leap to industry after graduation becomes easier to take. 



1 Introduction 

Most courses at university teach students only a theoretical point of view. Even 
in the last year of Electrical Engineering, the practical approach is sometimes 
left behind. After graduating, students will have to handle different problems 
in professional life. Theory comes in handy, but a practical solution is needed. 
Then, the lack of practical experience during education becomes apparent. 

It is the task of a university to provide projects in which students learn to 
solve complex but realistic practical problems. Telecommunications students are 
taught courses on Networking by professor Van de Capelle. Because the authors 
believe a practical approach is necessary, the students also have to work out a 
project. The aim of this project is to offer students project skills and a strong 
knowledge of networking. These projects will be described in this paper. 

Altogether, there are 27 students. These students are divided in three major 
groups. Each group has a different project to manage. The subjects are sketched 
in Section|3 Inside each group, there are several small subgroups, whose mission 
slightly differs. 

Each group first has to analyse the given situation. Then, they have to per- 
form the appropriate measurements of traffic load. Meanwhile, an analytical 
model of the configuration has to be invented. Starting from the measurements, 
the students simulate the configuration in OPNET and compare the outcomes 
with these of the analytic modelling. With this knowledge, the students pre- 
dict the consequences of implementing a new technology or service. Finally, a 
concrete realisation makes the project complete. 

In a first section we will focus on the different subjects proposed. Afterwards 
the supervision is described. After this, results of the students are discussed. 
Finally, the realisations inside the projects are pointed out. Further information 
can be found on the website of the projects p. 

P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 790- R021 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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2 Outline of the Subjects 

Three different subjects are treated: a point-to-point connection, a LAN-PABX 
and an Office LAN. 

2.1 Point-to-Point Connection 

The K.U. Leuven consists of two geographically separated campuses: one in Leu- 
ven (K.U. Leuven) and one in Kortrijk (KULAK). These two campuses are inter- 
connected with a 2 Mbps full duplex leased line. At this moment, only a small 
part of the leased capacity is used. 

Because of this inefficiency, the University of Leuven plans to use the remain- 
ing capacity for phone traffic. This way, phone calls to KULAK will be for free, 
while external calls to zone Kortrijk and adjacent zones will be at local rate. The 
other way around is also possible: employees of the KULAK will be able to call 
their colleagues of the K.U. Leuven for free while external calls to zone Leuven 
and neighbouring zones will be at local fee. Figure ^ illustrates the above. 



KULeuven IP-network KULAK IP-network 




KULeuven ISDN-network KULAK ISDN-network 

a. Actual situation 



KULeuven IP-network KULAK IP-network 




KULeuven ISDN-network KULAK ISDN-network 

b. Proposed situation 



Fig. 1. The K.U.Leuven-KULAK case 



Besides voice, the K.U. Leuven also wants to use the link for video confer- 
encing and dial-up networking: courses and lectures can be transmitted as video 
broadcasts, meetings can take place from remote distances and the surroundings 
of the KULAK campus can connect to the K.U. Leuven IP network at local call 
rates. 

It is up to the students to examine this case and to find out what is 
(im)possible. They also have to choose between two possible ways of transport- 
ing the voice traffic: by circuit switching or by packet switching. They have to 
consider the differences between the ISDN standard and other codecs I2| , paying 
special attention to capacity demand, packet loss and delay. 

2.2 LAN-PABX 

In most residences of the K.U. Leuven, students have a LAN connection in their 
room. This LAN is typically a shared Ethernet |3j, common for the whole resi- 
dence. In the bigger residences, the LAN is a partly switched Ethernet PJ. The 
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LAN is connected to the Internet. There are also collective phones, enabling 
students to make internal as well as external phone calls. Of course the LAN 
and phone network are separate infrastructures. 

Today, it is no longer needed to keep these two networks separate. In the 
LAN-PABX concept the LAN is also used for phone traffic. Computers with a 
microphone and speakers replace phones. The PABX is replaced by a server tak- 
ing care of the signalling. Another server, the gateway, performs the conversion 
to the external phone lines. 



The residence LAN Internet 




The residence phone network Global phone network 
a. Actual situation 




Global phone network 
b. Proposed situation 



Fig. 2. The LAN-PABX case 



The K.U. Leuven plans to offer students living in residences the services of 
such a LAN-PABX. Of course, a study is needed to provide enough information 
about possible problems and bottlenecks. Starting from the data traffic on an 
existing residence, students have to examine whether VoIP on the LAN can 
be provided or not. If so, an estimation of the offered quality for voice and 
the drawbacks for the existing data traffic has to be made. Figure El shows the 
architecture of a separate LAN and PABX and the one of a LAN-PABX. 

The gateway between IP and ISDN can be located on site (in the residence 
itself) or in a central point (where the connection between the K.U. Leuven IP 
network and the Internet is made). In the first case the voice traffic only loads 
the LAN, in the latter case voice traffic has to travel through the whole network. 
The qualitative and economic differences between those two strategies have to 
be examined as well. 

2.3 Office LAN 

A typical application area of the LAN today is the office environment. Work- 
stations, servers, x-terminals, PCs, printers etc. collaborate using the LAN. As 
an example, the students have to look at the office LAN of ESAT-Telemic, the 
research group to which the authors adhere. This project focuses on the appli- 
cation layer protocols that are used for file serving across the network: either 
the Networking File Serving protocol (NFS) 0, either the Server Message Block 
protocol (SaMBa) jSj can be used to share physical hard disks among different 
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computers. This way, the directory structure presented to the user is indepen- 
dent of whatever host he is actually working on: NFS or SaMBa are used to 
transport the data from the hard disk where it is stored to the computer where 
it is required, as if it was stored on a local hard disk. 

It is up to the students to redesign the file sharing on the ESAT-Telemic 
LAN in an optimal way. This redesign is not straightforward. First of all the 
workstations can be used as file server and as application server: the worksta- 
tions can be used to store the user’s data and can be used to run applications. 
Both functions can be separated or combined: in the former some workstations 
are used as file server and other as application server exclusively, in the latter 
option both functions are combined on a workstation. Subsequently the selection 
of a file serving protocol is required. This has consequences regarding perform- 
ance, security, service guarantee, fault redundancy etc. If SaMBa is the selected 
protocol, then one or more SaMBa server(s) need to be assigned. 

Students have to calculate and simulate performance indicators, such as delay 
and network load, for several possible network architectures. After a comparison, 
the optimal office LAN configuration can be determined. 

3 Supervision 

The students were supervised by a team of three research assistants, each being 
responsible for one part of the project: measurements, analytical modelling and 
simulations. A computer classroom was reserved one afternoon a week to work 
on the project. After an introductory session, every week a different part of the 
project is dealt with, alternating between measurements, analytical modelling 
and simulations. 

During these afternoons, a group discussion for every project showed the 
general progress and inspired students to co-operate. After the three discussions, 
they were able to ask questions about the specific topic. The afternoon sessions 
were certainly not enough to finish the project. A lot of work had to be done at 
home. In case of problems, students could contact the responsible assistant by 
email or simply go to his office. For the realisations, the team of assistants was 
reinforced with two experts sharing their experience with the students. 



4 The Approach of the Students 

In this section the approach of the students towards fulfilling the projects is 
given. Measurements, analytical modelling, simulations and realisation of each 
project are discussed separately. 
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4.1 Point-to-Point Connection 
Measurements 

Data Traffic. Of course, the actual data traffic on the leased line has to be 
measured. Therefore, the routers can be contacted using SNMP |S|. This method 
is based on the fact that the two routers on each side of the leased line keep track 
of the total number of bits and packets processed. With a tool like snmpget |Zj in 
Unix or Linux, information about the amount of incoming and outgoing packets 
and bytes, busy hour, the number of discards and the speed of the interfaces can 
be retrieved. 

Figure El illustrates the leased line between K.U. Leuven and KULAK. Meas- 
urements are performed by polling both cisco-kulak and cisco-kulnet regularly. 
The routers return on each SNMP poll the accumulated number of processed 
bytes and packets. By subtracting two subsequent measurements, the number of 
processed bytes and packets in the interpolling interval can be retrieved. These 
measurements were automated using Perl [B| and lasted for 11 days (no week- 
ends included). 



KULeuven IP-network KULAK IP-network 




Router: cisco-kulnet Router: cisco-kulak 



rtekM*/* 







Fig. 3. Measurements on the leased line 



Fig. 4. Packets arriving at cisco-kulnet 



The measurements afterwards were condensed in easy-to-interpret figures. 
Figure 0] is an example. It shows the average number of packets arriving at 
cisco-kulnet during a day. Out of the graph, the busy period can be extracted. 
As can be seen on Fig. 0 the busy period starts from lOhOO and lasts until 
17h00. Measured values during this busy period will be used instead of all the 
values. This is quite obvious: during this period the leased line will suffer the 
most from the added voice traffic. 

For other nice graphs, we refer to the website of the projects (follow the link 
to the reports) p. The obtained values and characteristics are needed for the 
analytical modelling and simulation. 

Phone Traffic. Calling information is also needed. The number of calls from 
K.U. Leuven to KULAK and vice versa, a profile of the call duration and busy 
hour has to be extracted from the call records of the K.U. Leuven. In these call 
records, each outgoing call is registered, along with the source, destination and 
duration. 
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Fig. 5. Call frequency from K.U. Leuven to 

KULAK Fig- 6. Probability density of call duration 



Some results obtained by the students are given in Fig. El and El In Fig. El 
the frequency of calls from K.U. Leuven to KULAK is shown. Out of this graph, 
a busy hour can be extracted. Figure 0 illustrates the probability density of the 
call duration. In this figure, an exponential distribution curve is fitted to the 
measurements. This results in two values: A and /x, both needed in modelling as 
well as in simulations. 



Analytical Modelling. Voice applications put high restrictions on the total 
end-to-end delay. This delay mainly consists of a propagation time, a coding- 
decoding delay in the gateway and queueing delays in the network. Depending on 
the source, the maximum tolerable delay for an interactive voice communication 
ranges from 150 to 250 ms. 

To save bandwidth students did not use the G.711 protocol on the leased line 
(as used in the ISDN standard), but the G. 723.1 codec resulting in a data rate of 
5.3 or 6.3 Kbps. If only one voice frame is included in a single packet, the bitrate 
on IP level increases to 16 Kbps (including the RTP-, UDP- and IP-header). 
Each voice frame has a duration of 30 ms, which gives rise to a packet rate of 33 
p/s. In this case, the coding-decoding delay in the gateways amounts to 85 ms. 

The propagation time is less than 1 ms and can therefore be neglected. Most 
of the effort is spent on the analysis of the queueing delays in the routers (cisco- 
kulnet and cisco-kulak) . 

A simplified model for a router is depicted in Fig.0 The router consists of a 
central queue in which all packets arrive and an outgoing queue for each interface. 
The speed of the central queue represents the routing capacity expressed in 
packets/second. It is hardware specific: for the cisco-kulnet router it is 100000 
p/s and for the cisco-kulak router 40000 p/s. The speed of the interface queue 
depends from the type of interface and is expressed in bits/second. Our leased 
line has a capacity of 2 Mbps. The gateways have an Ethernet connection to the 
routers. 

For simplicity, arrival times can be taken exponentially distributed. Buffer 
capacity is assumed to be infinite. No packet loss can be modelled this way. An 
M/D/1 queue can be used to model the central route engine queue: an average 
address lookup takes the same amount of time regardless the packet size. A 
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switch on the other hand requires an M/G/1 queue. The service time depends 
on the packet length. 




Fig. 7. A simple model for a switch or 
a router 



Packet delay between Kul and Kulak 




Fig. 8. Total packet delay from 
K.U.Leuven to KULAK 



Figure 0 shows the total delay induced in the cisco-kulnet router as function 
of the number of employees of the K.U.Leuven campus (a scale relative to the 
current situation is used). 

Students also calculated the delay values in case voice packets have priority 
over data packets. The difference in queueing delay becomes clear in Fig. 0 
A gateway can deal a limited number of simultaneous calls. The probability 
of blocking, i.e. when all lines are busy, can be modelled by an Erlang B dis- 
tribution. The maximum allowed blocking probability determines the minimum 
number of trunks (and thus gateways) needed. The blocking probability for the 
current phone load of the K.U.Leuven can be found in Fig. ITTI 





Fig. 9. Interface queueing delay from Fig. 10. Gateway blocking probability 

K.U.Leuven to KULAK for current K.U.Leuven phone load 
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Due to space limitations we cannot give all results here. For more information, 
we refer to the website of the design projects jj. 



Simulations. The same holds for the simulations as for the measurements and 
the modelling: also here we can only give some results. For more detailed infor- 
mation, we refer to the website of the projects [Q. 

For simulating the configuration, DPNET P|, was used. Reality had to be 
imitated as accurate as possible. An example of such a network topology in 
OPNET is shown in Fig. ITTl 

Once running properly, performance parameters roll out of the simulations. 
These parameters, such as traffic load, delay, packet loss and available capacity, 
can be compared to the measured and modelled values. An example of plots 
resulting from OPNET is given in Fig. ini The figure shows the queueing delay 
between cisco-kulnet and cisco-kulak. 

4.2 LAN-PABX 

Measurements. The traffic on the LAN of the residence has to be recorded. 
Therefore, tcpdump can be used. Transmission capacity, a useful parameter when 
implementing VoIP, can be measured with netperf m- In case the gateway is 
located at a central point, the data traffic on the K.U. Leuven IP network has 
to be included as well. Also an estimate has to be made of the voice traffic that 
will be generated by a student on a residence. 

In this limited amount of space, we can only give some of the results. For 
more detailed information, we refer to the website of the projects fp. 

On the LAN, the students measured the first 10,000 packets in each hour, this 
for one week. Out of these measurements, they extracted the busy period. This 
period is shown in Fig. cni As can be seen in the figure, the 10 Mbps Ethernet 
is almost saturated at the busy period. 

Another interesting graph is the one shown in Fig. 1141 This figure shows 
the distribution of the packet sizes as they were recorded on the LAN. These 
outcomes can be imported directly in DPNET. 

The measured data can be split up into different classes according to the 
applications which generate the data (ftp, mail, www, . . .). Again, we refer to 
the website for more information. 

Using srunpget on the routers in the K.U. Leuven IP network, figures similar 
to the ones in Section B~TI ca,u be constructed. The principle is the same, so we 
will not go any deeper into detail. 



Analytical Modelling. The voice application is no longer used in a simple 
point-to-point environment, but on a shared Ethernet. As stated before, there 
is a maximum acceptable delay to maintain a good quality. 

Also here, the G. 723.1 codec is used to save bandwidth. Students on the 
residence can either make a phone call with an IP-phone or with a software tool 
installed on their computer. Both act as a gateway. The PC’s soundcard together 
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Fig. 12. Queueing delay between 
Fig. 11. A network topology in OPNET K.U. Leuven and KULAK 



Traffic 26/11 




Hours 



Fig. 14. Packet size distribution on the 
Fig. 13. Busy period on the LAN LAN 



with a microphone causes about 30 ms of delay. The encoding-decoding delay 
still is 85 ms. 

For internal calls (on the same subnet) the network delay is caused by the 
Ethernet. For external calls an additional delay is induced by other network 
elements, such as routers. 

The delay and the packet loss on the shared Ethernet can be calculated using 
one of the two models described in HU- They were the only practically useful 
models, including the truncated binary exponential back-off algorithm, we found 
in literature. Both models assume a Poisson process for the arrival of frames and 
the total arrival of frames and retransmissions. In the more complex model of 
the two, the collision probability of a frame is dependant of its current back-off 
state. 

The measurements show that most of the traffic is generated by a limited 
number of users. Only 4 heavy users are counted during a busy hour. The Eth- 
ernet delay as function of the number of heavy users is depicted in Fig. II bl 
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It is clear that the subnet is near its maximum capacity. An upgrade to a 
100 Mbps or a switched Ethernet is necessary to implement the voice service in 
the residences. 

In case of external calls, the delay of other network components can be cal- 
culated in the same way as described for the point-to-point project. More infor- 
mation can be found on the website of the design project p. 




Time [sec] 

Fig. 15. Ethernet delay as function of the 

number of heavy users Fig. 16. Delay jitter during a simulation 



Simulations. As in Section EH this configuration was simulated in OPNET p. 
Again, different performance parameters can be extracted out of the simula- 
tions. As an example, we give the evolution of the delay jitter during a simula- 
tion in Fig. cni In this simulation the measured amount of traffic on the LAN 
is generated along with supplementary voice traffic. This simulation gives an 
approximation of reality when VoIP would be implemented on the LAN. 

Other plots can be found on the website of the projects p. 

4.3 Office LAN 

Measurements. A first set of measurements is performed to obtain the required 
data on the load and use of the network file service. Out of these measurements, 
the common profile of a user on the ESAT-Telemic network is extracted. This 
profile includes the number of files a normal user requires in one hour, the dis- 
tribution of the interarrival time and the size of these files. The profile is also 
refined by looking at applications in the measured traffic. The amount of traffic 
generated by different applications (www, ftp, nfs, Xll, smb, . . .) is determined. 
The profile can be found in the reports, available on the website of the projects 

P- 

The total load of the office LAN has to be measured as well. Therefore, 
tcpdump can be used. Different characteristics of the data traffic can be deter- 
mined: the average load on the LAN, the peak load, packet size distribution. 
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interarrival times of the packets, .... As an example, Fig. El shows the packet 
size distribution of the packets measured on the LAN during busy hour. 

Next to the load measurements, the performance of the workstations needs to 
be analysed as well. Here the difference between an old and a newer workstation 
should be taken into account. In an optimal scenario the newest and quickest 
workstations should be used as application and file servers, the older ones can 
be protocol servers (e.g. SaMBa servers). 



Table 1. Time to copy one byte of an average file and speed multiplier 



Workstation 


Time (ms) 


Speed Multiplier 


Hercule 


1.29 


14.35 


Toine 


5.17 


3.59 


Blondine 


5.43 


3.42 


Loebas 


18.58 


1.00 


Kastaar 


15.38 


1.21 


Duchesse 


6.51 


2.85 


Zulte 


16.16 


1.15 



Table □ shows the results of such a measurement. In the first column, the 
different workstations of Telemic are summed. In the next column, the times to 
copy one byte of a file of average length for each workstation are indicated. As 
you can see, Hercule is the fastest workstation and Loebas is the slowest one. In 
the third column, relative speed multipliers are accorded to each workstation. 
The slowest workstation, Loebas, is accorded the multiplier 1. The multiplier 
indicates the times a workstation is faster in copying files than Loebas. More 
results can be found on the website Q- 





Fig. 17. Packet size distribution on the 
office LAN 



Fig. 18. Ethernet delay in function of 
network utilisation 
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Analytical Modelling. The file serving delay consists of two parts: the network 
delay and the server delay. 

The network delay mainly consists of the delay in the shared Ethernet. It 
can be calculated as described above in Section IQ Figure CHI shows that the 
network utilisation is low and the delay is very small. The network is clearly not 
causing any trouble. 

The number of file transfers (differentiated into small, medium and large files) 
per time unit, together with their service times should give enough information 
to calculate the server delay. 

A file server can be modelled by an M/G/1 queue, either receiving complete 
files or the separate packets used to transfer a file. This difference between units 
has its consequences. In the former case simultaneous file transfer is impossible 
but a simple combination of M/D/ 1 queues offers very quick results. The delay 
of the file server is much higher than the network delay and will therefore be the 
bottleneck of the system. The latter case better fits reality but interpretation 
of the results is far more c omplicated. Now a single file consists of multiple 
packets. The total delay is not equal to the mean packet delay multiplied by the 
number of packets. It is not clear how to calculate the delay of a file when its 
packets are mixed with other’s. 

For a combined server the picture becomes even more complicated. Not only 
files or packets arrive in the queue but also elementary tasks. These tasks orig- 
inate from applications running on the server. Measurements in a controlled 
environment make it possible to estimate the elementary task rate for a specific 
application and the service time for such a task. 

This reasoning however is not correct. When multiple applications run at the 
same time, they are influenced by each other and produce less elementary tasks 
per time unit as a result. It is not clear yet how to model this feedback. 



Simulations. DPNET 0 was used to simulate these configurations. The same 
parameters as in Sections EH and O can be retrieved from the simulations. 
Results can be found on the website of the projects p. 

5 Realisation 

Because it is important to visualise the problems and solutions, a kind of reali- 
sation is required. At ESAT-Telemic, there is a network lab specifically intended 
for such purposes. The lab consists of several computers and some routers and 
hubs. 

The group handling a point-to-point connection had to rebuild the actual 
situation in the lab from scratch. This helped them to learn what a point-to- 
point connection is all about. 

The Office LAN group had to build a small LAN and get some servers and 
clients running, using either NFS or SMB. Afterwards a protocol trace was taken 
on both SMB and NFS. This made them understand the principles of file sharing. 
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The group of the LAN-PABX on the other hand had the opportunity to 
help with a real implementation of VoIP. With the support and material from 
Siemens, a temporal LAN-PABX was installed on a residence of the K.U. Leuven. 
Of course this was a great help in understanding the VoIP concept. 

6 Conclusions 

In this paper, projects are proposed for students who follow a graduate course 
on networking. The lack of practical insight after such courses is compensated 
by solving a practical problem. The proposed projects are close to reality; in 
fact the K.U. Leuven can use the outcome of the work of the students for future 
decisions. 
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Abstract. E-commerce is an increasingly significant part of the global 
economy. Users of E-commerce Web sites often have high expectations for the 
quality of service, and if those expectations are not met, the next site is only a 
click away. A number of performance problems have been observed for E- 
commerce Web sites, and much work has gone into characterising the 
performance of Web servers and Internet applications. However, the customers 
of E-commerce systems are less well studied. In this work we seek to quantify 
customer satisfaction with a Weh site. We observe that customers may he 
categorised either based on their satisfaction ratings, or on other factors fin 
which case satisfaction can be assessed for the customer class). Weh sites may 
then be evaluated relative to the different customer categories, with potentially 
more care being given to the satisfaction of high priority customers. We present 
a methodology for deriving customer satisfaction, and apply it to the evaluation 
of two academic Web sites. 



1 Introduction: E-commerce Customer Satisfaction 

The World Wide Web (WWW, or Web) is one of the most important Internet 
services, and has been largely responsible for the phenomenal growth of the Internet 
in recent years. An increasingly popular and important Web-based activity is E- 
Commerce, in which various types of financial transactions are carried out or 
facilitated using the Web. It is widely expected that E-Commerce activity will 
continue to grow and be a significant component of the global economy in the near 
future. In an area such as E-Commerce, users often have definite expectations about 
the service they receive (based on similar non-Internet transactions) and/or about Web 
service in general (based on previous Internet usage). Therefore, E-Commerce users 
typically demand high quality service, such as: desired information is easy to find, 
accurate, and current; the retrieved information is of good quality; establishment and 
download delays are low; and there is high availability of the Web server. 

A number of performance problems in E-Commerce systems have been 
observed, mainly due to heavier-than-anticipated loads and the consequent inability to 
satisfy customer requirements. E-Commerce systems have often been designed under 
tight delivery schedules and without due regard for the impact of different possible 
loadings on the performance of the system. This has resulted in a lot of work 
attempting to characterise the performance of Web servers and Internet applications 
e.g. [1], [2], [3], [4]. However, the customers of these E-Commerce systems are less 
well studied, despite the fact that some surveys show considerable dissatisfaction with 
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current E-Commerce and Web servers. For example, it has been reported that as many 
as 60% of users typically cannot find the information they are looking for in a Web 
site, even though the information is present [5]. Similarly, customer dissatisfaction 
with WAP phones is attributed to difficulties in accessing Internet applications [6], 
including low access speeds and the large number of menus traversed to access 
information. In an area such as E-Commerce, customers demand high quality service, 
and it is easy for them to move away to another site if they perceive the current one to 
be unsatisfactory. 

Determining how well customer requirements are met for all users of a Web site 
is difficult, and is often based on unrepresentative sampling or anecdotal evidence or 
on the experience of a “generic” user. By quantifying the factors involved in these 
requirements, it is possible to analyse and predict customer satisfaction with an E- 
Commerce sit in a more scientific manner. Customer satisfaction may be quantified 
using various parameters, such as response time, number of clicks needed to find the 
desired information, amount of information the customer is required to give, quality of 
information they sent, security of the interaction, and predictability of the service 
received. It is entirely possible that different users will attach different satisfaction 
levels to the parameters and also that they value the satisfaction of some parameters 
over others. 

Since customer satisfaction is critical to the success of E-Commerce systems, 
characterising the customer’s requirements is an important consideration in designing 
such systems. Our approach involves: 

Constructing a customer model which captures the satisfaction that the 
customer gets when using the site; 

Dividing the customers into distinct categories based either on how they 
judge their satisfaction, or on some other parameters. 

Web server performance may then be assessed relative to the different customer 
categories. Those parameters that the customers value most should be optimised first, 
and if certain categories of customers are more “valued”, the server can be geared to 
maximising those customers’ satisfaction level. Potentially, servers may even be 
designed to serve different customer categories differently. 

In the following we discuss customer categorisation and satisfaction. We present a 
methodology for quantifying customer satisfaction. Finally, we demonstrate our 
methodology in the assessment of customer satisfaction for two Web sites. 



2 Customer Satisfaction and Categorisation 

Our goal is to quantify customer satisfaction with a Web site in a uniform way. To do 
this we must be able to measure customer satisfaction according to the assessment of 
various parameters. This measurement must be mapped to some fixed satisfaction 
scale (e.g. 1 to 10 with 10 being the best). Finally, categories of customers must be 
defined which have similar satisfaction measures. 

Some of these parameters may be measured in an objective manner. For instance, 
response time can be measured in seconds. Similarly, the number of clicks to reach a 
given piece of information can be counted. Thus only the task of mapping the 
measurements onto the satisfaction scale remains. Other parameters are not as easy to 
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measure objectively. For instance, how does one determine the ease or difficulty of 
navigating a Weh site to find specific information? What one customer sees as 
simple, another may find confusing and unmanageable. Customer surveys might help 
to determine good measures for these less easily measured parameters. Even for 
parameters with objective measurements, mapping to a fixed scale may be done in 
various ways. For instance with response times, one customer may he willing to wait 
one minute, but no longer, while another becomes gradually more dissatisfied with 
waiting. 

Different customers may assign different satisfaction levels to different 
parameters. For instance, one customer may decide they are willing to give 
information such as name, address and telephone number to a Web site, while another 
may find it unacceptable to give any information at all. In addition, different 
customers may apply different weightings to the same parameters in judging their 
overall satisfaction with a Web site. At one extreme, this includes the case where one 
or more of the parameters gets a zero weighting, e.g. if all the customer cares about is 
response time, all other parameters are zero-weighted. Combining these parameters in 
order to quantify overall satisfaction leads to categories of customers with similar 
satisfaction measures and similar weightings. 

Ideally, customer categorisation classifies all customers based on how they judge 
their satisfaction with an E-Commerce system, grouping together those with similar 
requirements. In practice, it is not possible to determine the customer satisfaction for 
each individual customer. Instead, customers can be divided into distinct categories 
in some other way such as large/medium/small budget; type/speed of Internet 
connection the customer has; or frequent/previous/new customer. Customers in 
distinct categories may very well have similar satisfaction requirements (at least for 
some satisfaction measures). For example, if customers are divided according to 
means of access, WAP phone users might be willing to tolerate longer delays than 
PC-based users, as well as preferring a low number of hops to access information. 



3 A Methodology for Quantifying Customer Satisfaction 

The first step in our approach to assessing Web site performance via customer 
satisfaction is defining a customer list C consisting of one or more customers and a 
parameter list P consisting of features of a Web site which will potentially affect 
customer satisfaction. For each customer in C, their behaviour is defined in terms of 
their interaction with the Web site. A trace behaviour for a customer is defined as the 
series of clicks and other information that the customer exchanges with the site. 
Typical behaviour for a customer or customer may then be defined as one or more 
traces and satisfaction is measured relative to these trace behaviours. By further 
associating customers with customer classes we may arrive at a satisfaction measure 
for customer classes as well. 

For any trace behaviour, a measure of customer satisfaction relative to the 
parameter list can be defined as follows: 

1) For each parameter peP, define a quantification of satisfaction Q^. For 
instance, if p is the number of clicks, is easily defined as an integer 
value. Other parameters may have more subjective quantifications. For 
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instance, how does one quantify the "quality" of information available at 
a Web site? 

2) For each customer/parameter pair (c,p)e CxP, let >Scale map the 

quantification of satisfaction to a fixed range of values 

Scale=[min,max]. This mapping allows a large number of parameters 
to be compared in a uniform fashion. Note that Scale may take on either 
discrete or continuous values. Even in the case that a parameter is easily 
quantifiable (as in the number of clicks), the mapping of this to Scale 
may be subjective according to customer class. A WAP phone user may 
find it unacceptable to make 10 clicks to reach some information, while 
a PC-based user may find this tolerable. 

3) For each customer, an n-dimensional satisfaction vector may now be 
defined (where n is the number of elements in the parameter list) 
incorporating the customer satisfaction for all parameters. This 
satisfaction vector can be thought of as the "raw data" for determining 
customer class satisfaction. 

4) Different customers may value Web site parameters differently. For 
each customer class, determine a satisfaction weighting for each 
parameter i, denoted W/i)e [0,1] where 

Ew.(i) = i 

i=l 

These weightings are used along with the satisfaction vector from step 
(3) to define an overall satisfaction level for each customer trace. 

Thus for each behaviour trace we have arrived at a satisfaction measure for each 
individual parameter and for the Web site as a whole. By defining customer class as a 
collection of trace behaviours, we may extend our satisfaction measure to customer 
classes. 

5) Let a customer class be defined as a collection of trace behaviours. The 
class satisfaction measure is defined as a weighted sum of the trace 
satisfaction measures. (It may be considered that some behaviour is 
exhibited more frequently by a user in a Class, and this behaviour 
should be given higher weighting). 

6) Finally a weighting of customer classes can be defined, allowing for an 
overall satisfaction measure for the Web site. By varying this 
weighting, we can study how favouring certain customer classes over 
others affects overall customer satisfaction with the site. 

The most difficult part of this exercise is in relating customer trace behaviour to 
the satisfaction vector. How parameter satisfaction is measured and how it is mapped 
onto the Scale must be addressed on a case-by-case basis, although experience using 
the methodology may lead to the definition of some standard cases. Also, since 
multiple executions of the same trace may lead to different values, some statistical 
analysis may be required. An overview of the methodology is given in Figure 1 . 
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Figure 1: Overview of Methodology 



4 Application of the Methodology 

The approach outlined in the previous Section has been applied to the analysis of two 
university Web sites - Dublin City University School of Electronic Engineering [7] 
and Elorida International University [8]. For each of these sites, four classes of users 
were defined; internal students, external students, internal staff, and external staff. 
These user classes were distinguished by their behaviour, which in all cases was to 
seek some relevant information. For each user class, multiple traces were specified 
and a path weight assigned to each trace indicating its relative usage. 



808 H. Graja and J. McManis 



Three Web site parameters were defined: complexity, time, and quality of 
information. The quantification and assignment of satisfaction values was the same 
for all user classes. Complexity was quantified as the number of clicks in a trace; 
time was quantified as the overall time to complete a trace; and quality of information 
was judged subjectively based on how close the information retrieved was to what 
was actually sought. The scale chosen was a simple O-to-4, with 4 being best and 0 
worst. Complexity and time were divided into bands, and each band assigned a value 
in the scale; quality of information was assigned a value based on a survey of ten 
users. 

Data was gathered using the Web Performance Trainer 2.1 tool [9] to execute 
each of the traces on the actual Web site in question. This was necessary solely to take 
time data, and was carried out both on a weekday and a weekend day. The other two 
satisfaction values were determined by an inspection of the Web site. Tables 1 and 2 
in Appendix 1 summarise the data for the DCU and FIU Web sites, respectively. 
Analysis of the results and what conclusions we can draw from this study are given 
below. 



5 Conclusions and Future Work 

Modelling customer satisfaction with Web and E-Commerce sites is not as well 
studied as Web server modelling, but determining whether and how the customers of 
these sites are satisfied with their interactions is becoming more important as the Web 
matures. We have proposed a methodology for estimating how satisfied defined 
classes of customers are with a Web site. Our approach recognises that customer 
satisfaction is a complex issue and includes factors which are not easily measured. 

We have illustrated our approach by investigating the satisfaction some typical 
users experience with two university Web sites. These were chosen for representative 
purposes only and the results do not necessarily generalise to other Web sites. 
Nevertheless some important observations can be made. For example, in each of our 
three components of customer satisfaction (complexity, time, and quality) there was a 
range of values for the four user classes for each site. This implies that different users 
of these sites will have different perceptions of how satisfied they are with their 
interactions with the site. We also noticed a wide range in some of the users' 
satisfaction components; e.g. internal students in DCU averaged 1.7 for complexity, 
3.5 for time response, and 2.9 for quality on a 4-point scale. Depending on how users 
of this type weight these components in deciding on their overall satisfaction, they 
may form a quite negative impression of the Web site even though some components 
are satisfactory. This implies that assumptions about which components matter more 
to users should be checked against actual users, in order that the Web site is not 
optimised for some satisfaction component that users rate as less important. Overall, 
we have shown that our methodology is feasible and does distinguish satisfaction 
levels between different types of users. 

The next step is to investigate whether certain "generic” categories of users can 
be defined, and/or whether they care about "generic" Web site parameters (e.g. it 
seems likely that download time will always be a factor in user satisfaction). Given a 
specific Web site, we will explore methods for mapping these generic user types and 
satisfaction parameters into the site's content. If an analysis of the resulting 
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satisfaction measures shows that there is a disparity in the satisfaction of different 
user types, we will study how the Web site designer or administrator should take this 
into account, and whether their reaction can be determined dynamically while the user 
is interacting with the site. 
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Appendix 1: Tables 

Table 1: Customer Satisfaction results for [www.eeng.dcu.iel 
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Table 2: Customer Satisfaction results for hvww.fiu.edu 
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Abstract. This paper describes how XML documents can interact with internet 
smart cards. Such a card works as an internet node, including a web server. 
Because XML is made up of entities transported by HTTP protocol, it is 
therefore possible to import XML entities from smart card. We describe an 
original process for strong user authentication, which illustrates how internet 
card can work with XML documents and improve security. 



1 Introduction 

Smartcards are generally recognized as the best device for secure computing and data 
storage. But until now no real efforts had been done to integrate smart cards in 
internet technologies. Because internet applications are based on client/server 
paradigm, we have developed a technology which transforms a smart card in an 
ordinary internet node, supporting HTTP protocol. An internet smart card can be seen 
as a personal web server, which can interact with XML documents, thanks to entities 
identified by URLs, transported by HTTP protocol and located in smart cards. Many 
browsers are today supporting part of XML specifications, in this paper we 
demonstrate that it is possible to incorporate smart card in XML documents and thus 
to improve security, but we emphasize that a standardization is required for smart 
card resources identification. 



1.1 What Is a Smart Card 

A smart card [2] is a portable, tamper-resistant device [3], [4], [5]; it offers safe 
information storage and secure processing. It contains a microprocessor (CPU), RAM, 
ROM, and EEPROM, the all embedded in a single tamper protected chip (whose area 
is about 25 mm^) . It communicates with the outside world through a serial link 
associated to a single I/O pin, hence, only a half-duplex protocol is supported (at a 
baud rate from 9600 to 105900). 

Although a smart card may be often physically figured out as small scaled 
microcomputer system (with processing unit, bus, and memory), it can not be 
considered as a true computer. The lack of I/O resources like keyboard, screen , etc. 
makes it always dependent on another computer ( a terminal incorporating a card 
reader) which offers these resources. 
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ISO 7816 standards [1] define communication protocols between terminals and 
smart card. Embedded applications communicate with the outside world by means of 
Application Protocol Data Units (APDUs), which are exchanged between card and 
reader through a serial link according to command/response paradigm. Messages 
(.command) are sent from terminal to smart card, which in turn delivers a response 
(.response) ending by a two bytes status word. 

A command APDU usually contains five bytes, CLA INS PI P2 P3. The CLAss 
and INStmction bytes indicate the operation type (reading or writing for example). 
The two following hytes PI & P2 provide further operation parameters (like an 
address), last byte P3 specifies the length of additional data bytes. The response 
message contains optional data bytes and ends by two status bytes SWl & SW2. 
Status value "90 00" indicates an error-free operation. 



1.2 Smartcards Benefits 

Smart Cards are generally recognized as the most secure computers, they basically 
offer two kinds of features: 

1. Secure data storage, files are stored and protected by various authentication 
methods (like pin code, mutual authentication using DES key...). 

2. Secure processing , embedded cryptographic algorithms or software are executed 
inside this tamper resistant device. 

A combined use of these two functions made it suitable for electronic signing, and 
authentication purposes. 



2 Internet Smartcards 

2.1 Goals 

An internet card [10],[11] works as an internet node, and runs client or server 
applications defined by RFCs (like RFC 2068, HTTP 1.1 [9]). This innovative 
concept has been implemented in javacards, and works with existing web services; 
critical parameters are code byte size (around 7 kilo bytes), and data throughput of 
which measured value is around 300 bytes/second. 

Internet card shares the TCP/IP stack of its associated terminal, from a logical 
point of view it acts as an internet node which uses the terminal IP address and its 
internet access. This sharing is achieved through a new protocol (Smart Transfer 
Protocol - SmartTP) which looks like a TCP [8] light protocol and connects 
autonomous software entities (named smart agent) located in both, terminal and card. 
On the terminal side, special agents (network agents) have access to network 
resources. On the card side, agents run internet applications (HTTP ...), and reach 
internet thanks to data exchange with network agents. 
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INTERNET HTTP+SmanTP+APOU 



Fig.l. Internet smartcard architecture. 



2.2 Architecture 

Our internet smartcard architecture is illustrated in figure 1. We have defined a new 
communication stack, which uses two symmetric layers (Smart Layer), one located in 
the host system (HSL - Host Smart Layer) and the other in smart card (CSL - Card 
Smart Layer). HSL has access to network libraries, and to card reader APIs. It allows 
network packets transfer from/to the card. It establishes a logical path between 
existing host applications, such as web browser or electronic mail, and a smart card. 
CSL works with network by means of information exchanged with HSL. A smart 
layer is divided in two parts: 

1. Smart Transfer Protocol entity (SmartTP). 

2. Smart Agents. 

A smart agent is an Autonomous Software Entity. It can be realized by a DLL 
(Dynamic Link Library) in a PC or a cardlet in a Smart Card. It’s identified by a 
reference (a 16 bit number) which can be either constant (a well known value) or 
ephemeral. 
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In the host side agents are plugged to the network resources, they provide an 
internet access to agents located in a smartcard. 

Agents exchange information through packets called SmartTP PDU (SmartTP 
protocol data unit). SmartTP entity is a logical switch, and is in charge of routing 
incoming or outgoing PDUs to/from agents. 

The communication stack (figure 1) used hy a network card and its associated 
terminal is the following, 

1. OSI layer 1 and 2 (IS07498 [7]), supporting ISO 7816-3 transmission protocols. 

2. AMUX layer (Apdu multiplexer, using either PC/SC [6] or ISO 7816 [1] services), 
which routes APDUs to/from SmartTP entity. 

3. SmartTP entity, which switches SmartTP pdu towards agents. 

4. Agents, which process application data, and exchange SmartTP pdu. 



2.3 Basic Applications 

Web server 

A web server is an internet protocol specified by an RFC standard (HTTP 1.1 [9]). 
Its implementation in a card means that HTTP data, which are carried through the 
web by TCP/IP packets, are exchanged between card and terminal by means of 
SmartTP PDUs. From the application point of view an HTTP session is opened 
between the client (a browser) and a web server located in the card. 

An URL, http://127.0.0. 1:8080, where 127.0.0.1 is the terminal IP loop back 
address and 8080 the network agent TCP port , gives access to card index (an HTML 
file), which includes hyperlink towards internal or external resources. Embedded card 
resources, like cryptographic entities (cipher algorithm, digital signature, 
authentication procedure), multimedia objets (html page, image, sound...), software's 
(java applet...) are identified by URLs. 

Trusted proxy 

A proxy is a powerful and useful entity in the world of the TCP/IP technology. It 
includes a static TCP server and a TCP client, which is created dynamically upon 
each new incoming connection to the server. Client establishes a connection to a 
node, either a pre-defined one, or which is deduced from information's received over 
the server connection. A proxy forwards application data (carried by TCP/IP packets) 
from a TCP connection to another. A trusted proxy (embedded in a smart card) may 
be used for security purposes (SSL proxy, firewall), or to perform protocol 
translation. 



2.4 Working with Internet Smartcard 

Usually a smart card includes several embedded applications which are identified by a 
16 bytes number named application identifier (AID). Therefore a specific APDU 
{SELECT AID) is required to activate a given application. 

We have defined a three levels architecture (figure 1) in order to work with an 
internet smart card. 

1. First level is a network agent (agent pO) associated with a TCP port pO (for 
example p0=8082), which implements a web server and which is used to manage a 
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smart card. Typically it is possible to select by a particular URL (like 
http://ip:pO/?write, see figure 3) an application located in the card. 

2. Second level is a network agent (agent pi) associated with a TCP port pi (for 
example pl=8080) and which is used to route HTTP request message (http://ip:pl) 
towards a smartcard agent implementing a web server. 

3. Third level is a network agent (TCP agent) , which is used by smart card to 
establish TCP (client) connection with a remote internet server. 



3 Interactions between XML Documents and Internet Card 

3.1 XML Document and Smartcard Interactions 



DTD 




Fig.2. XML documents and internet smartcards 

XML documents [12] are made up of storage units called entities, which if necessary, 
are transported by HTTP protocol; we propose to import some of them from an 
internet smart card (figure 2). From a logical point of view an XML document is a 
tree of one or more elements, the boundaries of which are delimited by tags. 

Each element has a content which can be either an element or an another XML 
object (like entity or character data). 

A data type declaration (DTD), contains or points (by means of an URL) to 
markup declarations that provide a grammar for a class of document. A DTD defines 
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the tree structures, which are allowed by the XML document issuer. DTD can be 
made up of several entities, some of them may be embedded in an internet smart card. 

Unlike html page, XML documents can't be directly displayed, an extensible style 
sheet language file (XSL [13]) is generally needed to build an HTML (or WML) 
page, which is a human representation of some element contents. This HTML page 
may include software components, like script or applet, which will be loaded and 
executed at run time by a web browser. 

For example a javascript will force a redirection (deduced from XML elements 
contents) to an other web site (by invoking the location. href method), or an applet will 
process data imported from the original XML document. 

Interactions between an XML document and an internet smart card occurs in three 
steps (figure 2), 

1. First a browser downloads a root XML document which includes pointers to 
several physical XML entities, like XML document fragments, DTD and XSL. 
Entities are identified by URLs, some of them are located in one or several smart 
cards. The complete XML document is linked and then checked by a parser 
according to definitions found in its associated DTD. 

2. Second an html page is build by the XSL processor, this page can include scripts or 
applets which will be invoked at run time with calling parameters deduced from 
elements contents. 

3. Third a browser loads the produced HTML page, which includes software 
components like script or applet process. 



3.2 Internet Smart Card Detection 

A basic request, from a server point of view is to determine if a smart card is available 
on a given terminal or not. A possible solution to this problem is to use a technique 
that we call card bug. A card bug is identified by an URL which points to an image 
file (of which size is typically one pixel, for example a white pixel). An HTML page 
is able to detect (as shown in figure 3) an image downloading, and according to this 
event to dynamically select an XML (or HTML) document. 

Card bug (figure 3) is used to detect and select a particular application embedded 
in smart card. As an example the URL 

http://127.0.0.1:8082/?write=00A40400054A54455354 

(sent to the pO agent) selects (Select APDU = 00 A4 04 00, length = 05) an JTEST 
application (AID = ‘F ’T’ ’E’ ’S’ ’T’ = 4A 54 45 53 54). 

A card bug embedded in a smart card (like http://127.0.0.1:8080/keyl.gif) is used 
to indicate the availability of a DES key whose name is keyl. 
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Fig.3. Card bug concept. 



3.3 Example of Internet Card Resources 



Table 1. Internet smartcard resources 



File name 


Meaning 


Format 


/ 


Smartcard index 


FITML page 


/name.txt 


Bearer name - Pascal Urien 


XML entity 


/key 1. gif 


Card bug 


GIE file 


/Key 1 =69D A379EF995 80A8F 


DES encryption of a 8 bytes 
block 


XML entity 


/Key 1 =+69DA379EF995 80A84 


DES' encryption of a 8 
bytes block 


XML entity 
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We have designed an internet javacard (of which code byte size is around 7 kilo 
bytes) which includes a web server, and services identified by URLs. Table 1 shows 
the embedded resources list. 

Our smart card includes an index page, which contains information about its 
content, the bearer name (name.txt), a card bug (keyl.gif) which indicates the 
availability of a DES key whose name is keyl, and a method to compute DES (and 
DES ') algorithm according to the keyl value. 



3.4 Strong Authentication 

A strong authentication process is illustrated in figure 4. An html page of which name 
is select.html is downloaded by a browser. In this page a card bug 

<img src="http://127.0.0.1:8082/?write=00A40400054A54455354"... 

is used for detecting and running a smart card application named JTEST. Upon 
success, a redirection occurs and a new page named login.html is downloaded. In this 
page a web bug 

<img src="http;//127.0.0.1:8080/keyl.gif" ...> 

tests the presence in the smart card of a DES key named keyl. If this key has been 
detected the browser loads an XML document, login. xml. 

This document includes an entity (the content of name element) whose identifier is 
the bearer name (&name; - http://127.0.0.1:8080/name.txt) and an other (the content 
of response element) which requests the card to cipher a random number (the content 
of challenge element, 1234) with the card DES keyl algorithm 
(http://127.0.0.1:8080/Keyl=+1234). Once all parts of the XML document have been 
collected, an XSL processor builds an HTML page which is displayed by the browser. 
This page shows the bearer name (Pascal.Urien), the random number (1234) and the 
DES computation (response) of this number (DES(1234) = 5702C18C3A056058). 
These data are gathered by a script, which forces the browser to download a new html 
page whose name (name.random.response) is deduced from XML entities contents (in 
our example the requested page is, Pascal.Urien. 1234. 5702C18C3A056058). 



4 Conclusion 

We have demonstrated that internet smart card can interact with XML documents. We 
think that this innovative concept could improve security over the internet and could 
be used to extract private information from XML document. Obviously this 
technology required standardization efforts, first smart card URL formats need to be 
formerly defined (what are the values of the associated TCP port ?), and second card 
embedded resources should be identified by well known DTD coupled with card bug. 
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select.html 



JTEST applet selection 

<img width=l height=l onload=”on_load()” onerror="on_error()” 
src="http://127,0.0.1;8082/?write=00A40400054A54455354’’> 



login.html 



DES keyl detection 

<img onload="on_load()" onerror="on_error()” src="http://127.0.0.1;8080/keyl.gif’ > 



deny.html 




login.xml 

I 



login.xsl 

html page generation 



Login.xsl - html page generation 



http:/name.random.response 

Pascal.Urien.l234.5702C18C3A056058 



<?xml version=’1.0’ standalone='no'?> 

<?xml:stylesheet type="text/xsl” href=”!ogin.xsl”?> 

<!DOCTYPE login [ 

< 'ELEMENT login ANY > 

<!ELEMENT challenge ANY > ^Server challenge 
< 'ELEMENT response ANY > / ® 

< 'ELEMENT name ANY 



< 'ENTITY random 




'Smart card entities 
Card des(1234) 

<!ENTITY response SVSTEM "http://127.0.0.1:8080/keyl=+1234": 
<!ENTITY name SYSTEM "http://127.0.0.1:8080/name.txt"> 



]> 

<login> 

<challenge>&random;</challenge> 

<response>&response;</response> 

<name>&name;</name> 

</login> 



Card bearer name 



<xsl:stylesheet xmlns:xsl="http://www. w3.org/TRAVD-xsl" 

xmlns;HTML=”http;//www.w3.org/Profiles/XHTML-transitional” 

language="JavaScript"> 

<xsl:templatexxsl:apply-templates/></xsl;template> 

<xsl:template match=”text()"><xsl:value-of/></xsl:template> 

<xsl:template match="/"> Javascript, 

<HTMLxHEAD> <TITLE>Login</TITLE> executed at run time 



<SCRIPT language="JavaScript”> 

<xs!;conmientx![CDATA[ 

var V = document.XMLDocument.selectSingleNode(”login/challenge"); 
challenge = v.nodeTypedValue ; 

V = document.XMLDocument.selectSingleNode(”login/name”); 
name = v.nodeTypedValue ; 

V = document.XMLDocument.selectSingleNode("login/response"); 
response = v.nodeTypedValue ; 

setTimeout(”location.href = name -i- -i- challenge -i- -i- response", 2500)' 
]]x/xsl;conmient> 

</SCRIPT> 



<BODY> 

<xsl;apply-templates/> 

</BODY> </HTML> 

</xsl:template> 

<xsl:template match=”login"> 

<Hl>Welcome Mr <xsl:value-of 
select= "name "/x/H 1 > 

<P>Challenge; <xsl:value-of 
select= "challenge ’7x/P> 

<P>Response: <xsl;value-of select="response'7x7P> 
</xsl:template> 

</xsl:stylesheet> 



</HEAD> 



HTML page 
generated by 
login.xsl 



<HTMLxHEADxTITLE>Login</TITLE> 

<SCRIPT language="JavaScript"> 

setTimeout("location.href = ‘Pascal.Urien. 1234.5702C18C3A056058’ ”,2500) 
<SCRIPTx/HEAD> 

<BODY> 

<Hl>Welcome Mr Pascal.Urien</Hl> 

<P>Challenge: 1234x/P> 

<P>Response; 5702C18C3A056058</P> 

</BODY> </HTML> 



Redirection 

http;/ Pascal,Urien.l234,5702C18C3A056058 



Fig.S.Strong authentication process. 
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Abstract. In recent years World Wide Web traffic has shown phenomenal 
growth. The main causes are the continuing increase in the number of people 
navigating the Internet and the creation of millions of new Web sites. In 
addition, the structure of Web pages has become more complex, including not 
only HTML files but also other components. This has affected both the 
download times of Web pages and the network bandwidth required. The goal of 
our research is to monitor the download times of Web pages from different Web 
sites, and to find out to what extent the images contained in these Web pages 
influence these times. We also suggest some possible ways of decreasing the 
bandwidth requirements and download times of complex Web pages. 



1 Introduction 

At the beginning of the World Wide Web in the early 1990s, most Web pages were 
text based, with file sizes on the order of hundreds of kilobytes. Nowadays, Web 
pages have become much more complex. Static and animated pictures, sounds, 
dynamically generated pages and multimedia components have been included, 
increasing the typical total size of these Web pages to megabytes. In this way Web 
pages have become more attractive for their clients, but also more resource-intensive 
to send and retrieve. The immediate effects are increased delays in accessing the 
documents and overloading of the network. Thus fewer users are able to access Web 
site information in a given time period. At the same time Web traffic has become the 
most common type of traffic on the Internet, and the number of Web sites continues 
to increase dramatically. 

Most users who navigate on the Internet want to access a Web site as quickly as 
possible and don’t have the patience to wait a long time to load overly large Web 
pages. In order to increase the number of the clients who access a site and to keep 
those who are already visiting it, the owners of commercial sites should constantly 
monitor the performance of their servers. One of the most critical parameters is the 
download time, which gives a good indication of the waiting times for potential 
clients. 

Because a Web page consists of not only one HTML file, but a collection of many 
file types (e.g. HTML, images, JavaScript, Active Server Pages (ASP), cascading 
style sheets (CSS), MacroMedia’s Shockwave), the page download time is the time to 
download all the Web page’s components. While it is possible that the user might care 
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about the download times of individual components, we believe the total download 
time is in general important. 

We have monitored the download times for different commercial sites, and 
analysed their composition with respect to file size and type. Our experimental results 
show that images represent the biggest percentage of Web page size, and hence 
account for a considerable proportion of the download time for the page. We 
observed that some sites use dynamic HTML pages generated by JavaScript files or 
tags and Active Server Pages to minimize download times. These types of files are 
run on the client machine and can produce the same effect as a static HTML file with 
a lot of images inside, but are relatively small in size. 

The main aim of our research is to determine which components make the largest 
contribution to the total download time. We also suggest some possible solutions to 
decrease Web page download times and to reduce overloading of the network. 



2 The Structure of the Weh Pages 

Web pages are composed of multiple object documents. The main document is an 
HTML object, which can consist of one or more HTML files. The other objects are 
inline images or animations and Java applets. A browser accesses all these objects 
from the Web server using the HTTP protocol. The number and size of the object 
documents embedded in the Web page influence the download time of the Web page. 

Two types of the HTTP protocols are in use. The HTTP/1.0 protocol retrieves 
objects using a separate TCP connection for each object retrieved. Thus, multiple 
connections are created simultaneously between the server and the browser. As the 
number of components increases, more requests must be sent to the server, thereby 
increasing the total download time. Along with all the requests sent by other clients, 
these could easily overwhelm the server. The second protocol, HTTP/Ll supports 
persistent connections between the server and the client. In this case a single 
connection is used and all the document objects are delivered sequentially. The use of 
the HTTP/1 . 1 protocol reduces the connection overhead by using a single connection 
for getting all the components. However, the sequential nature of the retrieval might 
reduce the performance improvement in the case of many components. 

We analysed a number of Irish commercial Web sites, to determine the structure 
of their Web pages and the contribution of their components to the overall size. 
Specifically, we determined the structure of the main Web page (total size of the Web 
page, number of images, the percent of HTML and image files). The results are listed 
in Table 1. 

For the analysed Web sites, we see that - with the exception of Web Server 6 and 
Web Server 11 - images represent by far the biggest component of the Web pages. 
Some of the pages also have a large number of images. Both size and number of 
images could affect both network and server performance, especially in peak hours 
when there are a lot of clients visiting the page. Apart from images, the studied Web 
pages included other multimedia components such as JavaScript, ASP, and 
MacroMedia’s Shockwave files. 

To find out how images influence the access time of the Web pages, we did 
different tests for the sites in Table 1. In order to isolate those factors relating strictly 
to the composition of the Web page, we also analysed the structure of different Web 
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pages from the same site. In this way, we could account for the influence of the 
network path and the performance of the server machine. These tests and their results 
are presented in the next section. 



Table 1. Statistics about the composition of the studied Web pages 



Sites 


Total size 
(KB) 


Html file 
size (%) 


Images 
size (%) 


Other size 
(%) 


Number of 
Images 


Web Server 1 


368.5 


1.06 


98.94 


0 


2 


Web Server 2 


331.6 


4.70 


88.53 


6.77 


90 


Web Server 3 


136.9 


8.66 


73.90 


17.44 


26 


Web Server 4 


71.3 


3.53 


76.26 


20.22 


13 


Web Server 5 


72.0 


9.30 


90.70 


0 


8 


Web Server 6 


113.2 


0.75 


51.01 


48.24 


59 


Web Server 7 


57.1 


17.27 


68.53 


14.11 


14 


Web Server 8 


78.9 


0.23 


86.33 


13.44 


28 


Web Server 9 


117.6 


7.28 


92.71 


0 


6 


Web Server 10 


86.6 


13.69 


85.83 


0.48 


43 


Web Server 1 1 


51.5 


13.66 


33.39 


52.95 


88 



3 Experimental Results 

For our experiments we used the commercial sites presented in Table 1. These sites 
span a range of sizes, but all of them contain HTML, images, and other types of files. 
Download times are measured using a tool developed in the Performance Engineering 
Laboratory at Dublin City University [1, 2]. In our first experiment, we measured the 
effect of images on download times by comparing download times of the main HTML 
file for each site with that of the main page in its entirety. In our second experiment, 
we measured the effect the number of images had on download time under different 
network and server loadings. Different loadings are achieved by taking measurements 
throughout the working day. In our third experiment, we attempt to isolate the effect 
of page composition on performance from other factors of network and server 
loading. This is achieved by testing different Web pages from the same site. 



3.1 Experiment 1: Effect of the Images on Download Times 

To find out how much the images of a page influence the download time, we made a 
comparison between the download time of the main page and the download time of 
the main HTML file. As an example, the load times of the main Web page from the 
Web Server 5, with and without the images, are shown in Figure 1. 

We observe that the time necessary to download the main page and all its 
associated files is approximately four times as large as the time to download just the 
main HTML file. This increase is due to the eight images that are part of the page. 
These eight images represent 90.7% of the size of the Web page. In addition to taking 
a long time to download, these images are responsible for a significant increase in 
Web traffic. 
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Fig. 1. Comparison of the download time of the main page from Web Server 5 with and 
without all its components 



3.2 Experiment 2: Sensitivity of Performance to Web Page Composition 

In our second experiment we demonstrate that Web pages with a large number of 
images are more sensitive to network and server loading than those with fewer 
images. We compare measurements for four different pages having a large difference 
in the number of images. Two of the main pages (Server 5 and Server 9) have less 
than ten images, while the other two pages (Server 11 and Server 2) have around 
ninety images each. We periodically monitored the download time for the pages 
during a weekday, between 8:45 am and 6:30 pm. The download time at 8:45 am is 
taken to be a baseline measurement and a growth factor is measured as the ratio of the 
current download time to the download time at 8:45 am. Our results are summarised 
in Figure 2. 

As can be seen, the pages with a large number of images had a much larger 
growth than the pages with a small number of images. This indicates that a large 
number of images can seriously affect Web server performance. 



3.3 Experiment 3: Effect of Number and Size of Images on Download Time 

In Experiment 2, many factors might have influenced the download time including 
not only the Web page composition, but also the server performance, the network 
traffic and the distance from the client to the Web server (although all the Web 
servers are located in Ireland). In order to isolate the effect of page composition on the 
download time, we compare different pages from the same site. Pages are chosen with 
different numbers and/or sizes of images. First we look at pages that have varying 
numbers of images, but are all of similar size and with similar percentage of the size 
being accounted for by images. Second we look at pages where both number and size 
of images vary. In both cases the download times are measured for various server 
loadings, and the relative degradation of performance is obtained. 
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The Download Time Growth Analysed During 




Fig. 2. Download time growth for different Web pages during a day 



From Web Server 2 we chose three different Web pages with different numbers of 
images (between sixty-eight and ninety), but with similar image sizes as a percentage 
of the total size of the Web page. Because most of the problems of the Web servers' 
performance appear during the peak-hours period, we analysed the response time of 
the server for that period. Server loads were generated by making parallel requests for 
the page. Measurements are taken for 1, 10, 30 and 100 parallel requests. These 
measurements are summarised in Table 2. Growth is defined as the ratio of the current 
download time to the download time for a single client request. 



Table 2. The average download times during the peak hours of different Web pages from Web 
Server 2 with a variable number of simultaneous accesses 



Web 

Page 

Number 


Page 

Size 

(KB) 


Num 
ber of 
Imag 
es 


Img. 
Size (%) 


Numbe 

rof 

Parallel 

Clients 


Averag 

e 

Download 
Time (sec) 


Growl 

h 


Page 1 


292.5 


90 


87.51 


1 


14.88 


1.00 


Page 1 


292.5 


90 


87.51 


10 


18.85 


1.27 


Page 1 


292.5 


90 


87.51 


30 


25.75 


1.73 


Page 1 


292.5 


90 


87.51 


100 


54.39 


3.66 


Page 2 


231.8 


75 


88.89 


1 


11.34 


1.00 


Page 2 


231.8 


75 


88.89 


10 


12.86 


1.13 


Page 2 


231.8 


75 


88.89 


30 


18.89 


1.67 


Page 2 


231.8 


75 


88.89 


100 


37.27 


3.29 


Page 3 


209.4 


68 


89.65 


1 


12.25 


1.00 


Page 3 


209.4 


68 


89.65 


10 


13.50 


1.10 


Page 3 


209.4 


68 


89.65 


30 


18.80 


1.53 


Page 3 


209.4 


68 


89.65 


100 


39.04 


3.19 
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Fig. 3. Access time growth during peak-hours for different pages from Weh Server 2 



When the number of clients who access the same Web page in parallel increases, 
the growth factor of the download time is bigger. Thus, more and more requests for 
the components of the Web page are sent to the server overloading it. Comparing 
Web Page 3 with Web Page 1 there is a significant difference of the download time 
growth when there are 100 clients in parallel. The growth factor for the three pages is 
presented in Figure 3. 

We see that Page 1 has the worst performance and it has the largest number of 
images. Page 1 is also slightly larger than Page 2 and Page 3, with a larger image 
size. 



Table 3. The access time of Web pages with various numbers of images from Web Server 8 



Web 

Page 

Number 


Page 

Size 

(KB) 


Num 
ber of 
Images 


Img. 

Size 

(%) 


Numbe 
r of 

Parallel 

Clients 


Average 

Download 

Time(sec) 


Grow 

th 


Page 1 


78.9 


27 


86.29 


1 


7.90 


1.00 


Page 1 


78.9 


27 


86.29 


10 


8.83 


1.12 


Page 1 


78.9 


27 


86.29 


30 


19.65 


2.49 


Page 1 


78.9 


27 


86.29 


100 


42.19 


5.34 


Page 2 


43.5 


39 


54.37 


1 


2.78 


1.00 


Page 2 


43.5 


39 


54.37 


10 


3.98 


1.43 


Page 2 


43.5 


39 


54.37 


30 


7.30 


2.62 


Page 2 


43.5 


39 


54.37 


100 


19.09 


6.87 


Page 3 


36.9 


17 


53.20 


1 


2.23 


1.00 


Page 3 


36.9 


17 


53.20 


10 


2.55 


1.14 


Page 3 


36.9 


17 


53.20 


30 


4.01 


1.80 


Page 3 


36.9 


17 


53.20 


100 


11.62 


5.21 
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In order to study the effect of the size of the images versus the number of images 
on download time, a similar analysis was done for three Web pages from Web Server 
8. For these pages, both the size of the images and the number of images varies. The 
composition of the Web pages is summarised in Table 3. 

A comparison between the download time for Page 2 and Page 3 shows the 
influence of the number of the images. Although Page 2 and Page 3 are similar in size 
and size of images, at 100 requests Page 2 takes nearly twice as long to download as 
Page 3. The growth factors for the three pages are plotted in Figure 4. 



The Growth Of The Download Time For Different Web 
Pages From The Web Server 8 




Number of Clients 

I ♦ Page 1 — ■ — Page 2 a Page 3 | 



Fig. 4. Growth factors for different Web pages for Web Server 8 

A comparison of Page 1 and Page 2 indicates that the number of images has a 
greater influence on performance sensitivity than the size of images. Page 2 is 
smaller, but has a greater number of images than Page 1. The growth factors for Page 
2 are consistently larger than those for Page 1. 



4 Conclusions and Future Work 

The results of this study lead to a number of interesting observations about some 
factors that could influence Web Server performance. These factors include: the 
number of images, the total size of the images, a large number of clients accessing the 
Web server simultaneously, and the period of time (peak/off-peak hours) when the 
requests are made. The work reported here suggests that the number of images has a 
disproportionate effect on server performance, particularly when the server is heavily 
loaded. In order to ascertain if our assumption regarding loading patterns is correct, it 
will be necessary to either measure or control the loading of the Web pages. 

Experimental results suggest that images do have a great influence on download 
time. This indicates that designers of Web pages need to find a compromise between 
the look of a page (with lots of attractive pictures) and the performance seen by 
clients of the page (for which download time is a reasonable measure). Many static 
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solutions exist to improve download time: for example, a faster Internet connection, a 
better-performing server, and smaller Web page sizes. A significant amount of effort 
has gone into minimizing image sizes and bandwidth requirements. A lot of research 
on compression algorithms has been done suggesting that one may reduce the size of 
an image file, keeping a good image quality [3]. Also UC Berkeley's Transend [4], 
Intel's QuickWeb [5] and Spectrum's FastLane [6] systems tried to improve the access 
to slow links reducing image size via lossy compression using Web proxies which 
transform the images in ones with resolution and color reduction. Gilbert and 
Brodersen [7] proposed a methodology to improve Web access using a new technique 
called global progressive interactive Web delivery, which entails applying progressive 
coding to the document transmission process in its entirety. 

Another solution is to use DHTML animations created with lavaScript, 
MacroMedia’s Shockwave/Flash, or Microsoft’s DirectAnimation instead of image 
files (currently, most of the images on the Web are GIF or JPEG [8]). The effect of 
these files is more spectacular and their use decreases the number of connections 
created between the browser and Web server, thus reducing bandwidth requirements. 
Other possible solutions to improve the download time are presented in [9] where 
ways are suggested to reduce the number of bits each page needs and to make the 
JavaScript code faster. 

We suggest that a class of dynamic solutions should also be considered. For 
example, the Web server could monitor its download times and reduce the amount of 
information sent during peak times. Transmitting only some of the embedded files 
will reduce the Web page content quality. In this case a compromise between the 
quality of the Web page and the performance of the server has to be made. It may also 
be possible for the client to monitor the speed of the download and control how much 
information they want to receive. In this way the client's perception of the Web page 
would take into account the page's size and composition, and how these affect the 
expected waiting time. 
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Abstract. Current networks integrate multiple transport technologies 
to exploit their relative advantages and different functionalities, and 
these are multilayer networks combining IP/ ATM/SDH/WDM technolo- 
gies. To provide end to end survivability, different restoration mecha- 
nisms at different layers must be combined efficiently in order to achieve 
fast restoration, maximum availability and minimum restoration cost. 
This paper describes the key issues associated with the implementation 
of restoration in a multilayer network. Moreover, the “best” strategies 
proposed so far are presented and the current state of standardisation 
is discussed. Finally, current unresolved issues and open problems are 
highlighted. 

Keywords: survivability, multilayer network, resilience issues, restora- 
tion strategies 



1 Introduction 

With the deployment of networks combining up to four different technologies (IP, 
ATM, SDH/SONET and WDM) for the network layer, the design of end to end 
survivability in multilayer networks has become a major topic of interest in itself. 
A standardised infrastructure for multiple network layers is presented in [T7ED|. 
New issues associated with survivable multilayer networks arise; these issues 
are related to the layering and the fact that each technology provides its own 
restoration mechanism. The layering encompasses the overlap of functionalities, 
and adds complexity of interworking. Issues of paramount importance include the 
coordination between the various restoration mechanisms at different layers and 
the allocation of spare resources. Also, issues regarding the timing and location 
of the restoration process need to be addressed. 

A number of projects have been set up, as the issue of survivability in mul- 
tilayer networks attracted significant interest. European projects like ACTS 
PANEL, MIS A or RACE II IMMUNE led to a better understanding of 
the issues of restoration in multilayer networks, and the design of a general 
framework for designing end to end survivable networks. A survivability frame- 
work can also be found in These projects also contributed to the process of 
standardisation. 



P. Lorenz (Ed.): ICN 2001, LNCS 2093, pp. 829-^23 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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The remainder of the paper compares the results of different projects and 
concludes with a recommendation of the “best” strategy for certain purposes. 
In spite of the encouraging results obtained so far, it is noted that most of the 
strategies developed were only proposals demonstrated and are not yet imple- 
mented in real life networks. In other words, the state of standardisation is still 
in its infancy. Some standards that have been initiated are highlighted. 

While the major issues are well known, some problems still remain unresolved 
and preclude the restoration design from being cost effective and satisfactory. 
Interesting unresolved issues are expressed in this paper. 



2 Comparisons of Strategies 

End to end survivability in multilayer networks implies consideration of restora- 
tion mechanism, i.e the restoration algorithms or the deployment of 1-1-1, 1:1 or 
1:N protectionQ. Survivability is designed to enable the recovery of traffic flows 
affected by network failures. 

Resources are added to allow for the traffic to be carried over spare resources 
until the failed equipment is repaired. Allocating spare capacity is complex as 
such capacity is needed at each layer and becomes even more complex when 
sharing capacity between layers is considered. The cost of spare capacity should 
be kept to a minimum while simultaneously ensuring maximum survivability. A 
further cost of restoration is the exchange of restoration messages that require 
extra overhead management. 

The allocation of spare capacity is closely related to the strategy of coordina- 
tion of each recovery mechanisms, as well as the failure scenarios considered. The 
coordination of recovery mechanisms, also known as escalation m. is essential 
to provide complete restoration at minimum cost. Additionally, the coordination 
seeks to achieve minimal restoration time. Thus, the role of the Telecommunica- 
tions Management Network (TMN) is considered .Finally, any strategy needs to 
take into account different granularities which, potentially have different values 
at each layer. Granularity characteristics involve issues such as the time scale 
for the restoration that varies between milliseconds and minutes (temporal gran- 
ularity), the rerouting can be processed at the packet level or at the wavelength 
level(bandwidth granularity), and aggregated traffic classes or individual traffic 
classes can be considered (QoS granularity). 

A number of strategies have been proposed. First of all, the issues of where 
and when to trigger the recovery mechanisms are considered. It is stressed at 
this point that before considering restoration in a multilayer network, the most 
suitable restoration strategy must be selected at each layer. For ATM and SDH 
technologies, report an analysis for the choice of the recovery strategy. 

At the optical layer, different protection strategies are referenced in Lastly, 

^ dedicated resources are allocated. For H-1, traffic is sent in both the working and 
backup paths simultaneously. For 1:1 and 1:N, the traffic is sent in the backup path 
after a failure in the working path has been detected. 1:1 corresponds to 1 backup 
link for 1 working link, whereas 1:N corresponds to 1 backup link for N working links 
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other references can be found in 0. Restoration mechanisms are required at each 
layer, e.g to cope with cases of failures of cross-connects at each layer. To avoid 
deadlock situations and resource competition, recovery mechanisms at each layer 
need to be coordinated m- So far, two types of strategy have been considered 
to achieve end to end survivability: the “lowest layer recovery strategy” and 
the ’’highest layer recovery strategy” IVIllll^llel . The “lowest layer recovery 
strategy” starts the restoration at the closest layer to the failure location. The 
“highest layer recovery strategy”, instead, starts the restoration at the closest 
layer to the origin of the trafhc. 

In the ACTS PANEL Project, both strategies have been compared and the 
“lowest layer recovery strategy” appeared to be the most efficient in both achiev- 
ing faster restoration and lower cost resources. 

However, this strategy is not the best suited strategy for providing surviv- 
ability in multilayer networks. Indeed, this strategy is more complex to imple- 
ment, as coordination between the recovery mechanisms within each layer is 
required m Moreover, signaling messages for the coordination must be de- 
fined m- After the detection of defects, recovery mechanisms must be triggered 
by the intervention of TMN to prevent the undesirable triggering of mechanisms 
at other layers before the server has the opportunity to complete the restoration. 

The coordination between the recovery mechanisms is realised by delaying 
the activation of the recovery mechanisms at higher layers. Two means have 
been employed: the hold-off time and the recovery token. Demonstrations in 
the labs led to the conclusion that the use of a hold-off time enables a faster 
restoration. Therefore, a satisfactory strategy to tackle the problem of end to end 
survivability would be to implement a hold-off timer which delays the activation 
of the higher layer recovery. Thus a maximum amount of traffic is restored by 
the lowest layer. The hold-off time is set up to prevent deadlock situations, and 
can be modified in the TMN within the Performance management area m- 

The PANEL project m considered an integrated management system that 
enables the coordination of the overlay layers. An SDL tool from the University 
of Munich (SELANE) was used, among other things, to model the integrated 
management system. An Integrated Network Management(INM) system can be 
modeled, to act as an end-to-end management control of the layered network. 
The management system referred to the Q3nn ATM and SDH management 
interfaces developed during the project ACTS MISA. 

Within BT, a protocol was developed to achieve end-to-end survivability in 
hybrid ATM/SDH networks. 

In their work, Veitch et al. j,3 1 p3t!|33j dealt with the restoration of traffic in an 
ATM/SDH network, considering an approach similar to the “highest layer recov- 
ery strategy” developed by the PANEL Group, and corresponds to a groomed 
multilayer policy. Restoration processes were decoupled and processed concur- 
rently. Only physical failures were considered, as they are the most likely to occur 
in practice. The results of the developed protocol proved full resilience against 
physical failures. However, the time needed for restoration was not evaluated. 
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In other experiences, the cost in terms of spare capacity required was com- 
pared for different strategies of restoration. In Veitch et al. showed that it 
is cheaper to restore client connections at the client layer rather than at the 
server layer, and this strategy becomes even cheaper when the number of client 
cross-connects increases. These results are in contradiction to the ones obtained 
in the PANEL project. However, it must be stressed that the experiments only 
considered single link failures, whereas also link and cross-connect failures are 
taken into account in PANEL. Besides, in PANEL the number of overlays of 
different technologies was considered small, whereas in BT, a full overlay of one 
technology on top to another was considered. In one drawback was pointed 
out: restoration in the highest layer results in a higher number of connections to 
restor^ hence the time to restore might be long, and the management of the 
restoration messages be complex. Therefore, to increase the granularity, a propo- 
sition of grouping the client connections was made (consisting of VPGs(Virtual 
Paths Groups) for ATM VPs) [2DE2|. 

In the BT approach the spare capacity was not a major concern. Require- 
ments of each layer were calculated separately. As the demand of spare capacity 
from higher layers must be carried by lower layers, the cost can be very high. 
To lower the cost, the PANEL Group proposed to share the spare capacity be- 
tween layers. The concept is meant to be generic and is known as “common 
pool of capacity” In the PANEL project, the common pool was pro- 

posed for ATM/SDH and SDH/WDM 0 layered networks. Gomparisons of costs 
proved that the common pool allows important savings. However, the concept 
considers only a single failure, a cable cut or a cross-connect failure. Moreover, 
the common pool applies to the “lowest layer recovery strategy” . The concept is 
possible when the spare capacity allocated at the client layer is considered as pre- 
emptible resource. Therefore, a protocol to perform the preemption is required. 
A “squelching” mechanism which informs the client layer of the unavailability 
of its spare resources, then enabling the preemption, is standardised for SDH 
rings ITHI . 

Both BT and PANEL approaches provided frameworks to consider the sur- 
vivability in multilayer networks. A guideline resulted from the PANEL project: 
NIG-G5 AGTS, ’’Towards Resilient Networks and Services”. 

Moreover, the PANEL project contributed to the work of the ITU-T SGI3 
QI9/I3 WP3/I3 working on the adaptation of the SDH layer to OTN layer 
networks. Besides, PANEL has influenced the work of the ETSI TM3 WG13, 
which released the DTR/TM-3025 paper related to the hold-off time function- 
ality in MSP and MS-SPRing. More information concerning the hold-off timer 
is found in the draft technical report “Enhanced Network Survivability Perfor- 
mance” from the Working Group T1A1.2, released in November 2000 0. 

Recently, more consideration is given for restoration in optical IP networks. 
Survivability considerations for such networks are provided in Moreover, 
restoration in optical IP networks has been described for different networking 
architectures in P|. Rather than considering restoration in each independent 

^ due to the finer bandwidth granularity of the highest layer 
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networking layers (optical and IP), it is shown in 0 that integrated architectures 
are more cost effective in terms of required equipment. Moreover, analogous to 
the project PANEL where “highest layer recovery” and “lowest layer recovery” 
schemes were compared, P| compared the cost effectiveness between restora- 
tion in a “service layer” architecture and a “transport layer” architecture. The 
drawn conclusion was that the latter is the most suitable for large IP networks, 
providing better restoration performance than service-layer architecture and at 
a efficient cost. 

Clearly, significant work has already been carried out and major results been 
achieved. However, some issues are still unresolved. The next section discusses 
some of those. 

3 Open Issues 

Studying approaches presented in the literature, some open issues have been de- 
rived. The following list presents problems found out during the various studies, 
and still unresolved. Initial work has been carried out to address some of them. 
Clearly, this list is not exhaustive but some of the major issues are included. 
The issues can be classified into three different parts. 

1. Rapid detection of the fault, crucial for a restoration strategy based on a 
“highest layer recovery” approach. 

2. Management of the restoration mechanisms for an end-to-end survivability 
based on a re-routing approach. 

3. Minimisation of the cost of spare capacity. 

The problem of detection of failures in IP networks is still an issue, as with 
current protocols the time it takes to detect a failure at the IP layer is still 
typically measured in tens of seconds ^mm- Such a long time-to-detect is 
harmful to end-to-end restoration. First of all, the large time to detect might 
affect a large amount of traffic, hence dropping the quality and continuity of 
service for many users. Secondly, once the IP layer detects the fault, the recovery 
actions triggered may interfere with those at lower layers, which had detected 
the failure much earlier but did not have time to complete the restoration cni. 
If the restoration is left to the IP layer, the detection and localisation of the 
failure may be much longer, due to the timer scales used (hello and keep alive 
timers) ^2|- Generally, the timing aspect is an issue for layered networks. A 
reference to the timing problem can be found in P|. The paper summarizes 
issues related to IP over optical networks, and gives some timing parameters to 
perform a coordinated end-to-end restoration. 

End-to-end survivability generally relies upon a end-to-end management con- 
trol of the layered network. The coordination of the recovery mechanisms often 
requires the intervention of an integrated management system. Some failure sce- 
narios cannot be effectively resolved without the intervention of an integrated 
management m- Correlation between alarms at different network layers are 
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necessary. The project MISA refers a model for the ATM/SDH case. Standard- 
isation of protocols and a generic approach are required for implementation in 
real networks. 

Due to the undesirable detection of a physical failure at different layers, the 
TMN might not locate the origin of the fault. Upon a failure at the physical layer, 
the propagation of the failure might preclude the TMN from locating the failure 
between the physical layer and the cross-connect of higher layers El The issue 
is to prevent the activation of the recovery mechanisms at higher layers when 
only the lower layer recovery is needed. 

Another consideration is that the optical layer cannot, in general, detect 
faults at higher layers. Therefore, that layer might not be able to provide a true 
protection m The integrated TMN, for instance, could inform the optical layer 
in order to trigger the recovery mechanism at this layer. 

The coordination between restoration mechanisms at different layers is pos- 
sible by means of an escalation of the restoration between layers. Escalation of 
restoration at different layers is sometimes based on the use of hold-off timers, 
which prevent from duplicating and overlapping recovery actions. The use of an 
hold-off timer may, in some cases, slow down the restoration process. The result- 
ing delay, when a hold-off timer is used for the coordination between the recovery 
mechanisms at the different layers could be annulled whenever the server layer 
recovery has failed. A proposition of adding new 0AM signals from the server 
to the client layer has been made in [Q, to neutralise the hold-off time in order 
to immediately trigger the recovery at the client layer. 

The use of an hold-off timer does not always prevent a fault to propagate in 
higher layers. When the optical layer detects a fault, it cannot prevent the fault 
from being propagated to higher layers, causing the activation of the recovery 
mechanisms at these layers when these are not required. Such a case has been 
demonstrated in SDH/ WDM networks mm . This situation should be avoided 
to prevent deadlock. 

Also, albeit the use of hold-off timer could resolve the problem of contention, 
it causes the completion of the restoration to be delayed. An issue is to determine 
if the standardisation of hold-off timers, as it had been done for SDH rings 
networks, is useful for other technologies or not. 

Another unsolved issue applies to the “highest layer recovery” scheme. This 
strategy, detailed in previous sections, enables a single restoration mechanism 
for the traffic generated at different layers. Therefore, the restoration of affected 
traffic generated at a higher layer and carried over trunks at a lower layer, relies 
upon the restoration scheme of the higher layer. The issues arises when the 
backup path at the higher layer must be found. Correlation of the routing tables 
between the server and client layers to have client working and backup paths 
physically disjoint is necessary. If dynamic routing is considered, protocols must 
enable the correlation. 

Finally, the last type of unsolved issue lies in the use of spare resources to 
enable the end-to-end restoration. 

In order to minimise the cost of resources, the concept of sharing capacity be- 
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tween layers was developed. Also, grouping the connections at the higher layer 
could minimise the number of alarm generations, and speed up the restora- 
tion E2EH1- How is this grouping traded-off against the spare capacity require- 
ments, i.e the extra cost, resulting from a consideration of a coarser protection 
granularity? 

In most approaches the spare capacity is calculated in a top-down approach. 
A feedback from lower layers should be considered when the higher layers re- 
quirements are computed. This would enable the optimisation of spare resources. 
A feedback loop was used in some cases during PANEL but its complexity and 
run-time performance makes it difficult to use. New mechanisms should make 
the feedback loop more attractive. 

Moreover, an additional cost to the transmission spare capacity must be 
considered: the cost varies when different ports to separate spare and working 
resources are used The separation of working and spare capacity enables the 
sharing spare capacity between restoration strategies at different layers, at the 
expense of an additional port cost. An evaluation of the spare capacity savings 
compared with the cost of additional ports would be interesting. 

4 Conclusion 

Different strategies of providing end-to-end survivable multilayer networks have 
been discussed. From the results of the experiments, comparisons of the different 
strategies were possible but no single “best” strategy can be derived. Further 
studies are necessary to consider different degrees of integration of a technology 
on top of others. 

The problem of survivability in multilayer networks has been well defined 
and the main issues have been identified. Some strategies have affected the work 
of the Standards organisations, producing the release of various guidelines. 

Nevertheless, some issues are still unsolved, leaving the problem of surviv- 
ability in multilayer networks open. Three types of issues have been highlighted: 
detection of the failure, management of restoration mechanisms, and efficient 
use of spare resources. The detection of a failure must be prompt to enable a 
fast restoration of the affected traffic. Since long time spells between messages, 
informing about the state of the links, occur in IP networks, it is not currently 
feasible to achieve a fast restoration only relying on the restoration at this layer. 
Moreover, management issues arise when re-routing strategies are used in the 
end-to-end survivability implementation. The issues consider the use of an in- 
tegrated management system, that requires standards, as well as correlation 
mechanisms between topologies at different layers when the restoration is left 
to higher layers. Finally, multi-layer restoration implies a cost in resources, that 
is not also a transmission capacity cost but also a extra cost in the number of 
ports. Comparisons of restoration strategies that include the cost of the nodal 
cost (ports and backplanes) would be of interest. 
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